Wavelets
Wavelets
February 3, 1999
– Can’t you look for some money somewhere? Dilly said.
Mr Dedalus thought and nodded.
– I will, he said gravely. I looked all along the gutter in O’Connell street. I’ll
try this one now.
James Joyce, Ulysses
Preface
1
In addition, there have appeared a few books on the subject aimed at the general
public.
ii
iii
1 Introduction 1
1.1 The Haar Wavelet and Approximation . . . . . . . . . . . . . 3
1.2 An Example of a Wavelet Transform . . . . . . . . . . . . . . 6
1.3 Fourier vs Wavelet . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Fingerprints and Image Compression . . . . . . . . . . . . . . 12
1.5 Noise Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
I Theory 15
2 Signal Processing 17
2.1 Signals and Filters . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 The z-transform . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 The Fourier Transform . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Linear Phase and Symmetry . . . . . . . . . . . . . . . . . . . 24
2.5 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6 Two-Dimensional Signal Processing . . . . . . . . . . . . . . . 29
2.7 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Filter Banks 33
3.1 Discrete-Time Bases . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 The Discrete-Time Haar Basis . . . . . . . . . . . . . . . . . . 36
3.3 The Subsampling Operators . . . . . . . . . . . . . . . . . . . 39
3.4 Perfect Reconstruction . . . . . . . . . . . . . . . . . . . . . . 41
3.5 Design of Filter Banks . . . . . . . . . . . . . . . . . . . . . . 45
3.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4 Multiresolution Analysis 49
4.1 Projections and Bases in L2 (R) . . . . . . . . . . . . . . . . . 49
4.2 Scaling Functions and Approximation . . . . . . . . . . . . . . 55
v
vi CONTENTS
II Applications 121
8 Wavelet Bases: Examples 123
8.1 Regularity and Vanishing Moments . . . . . . . . . . . . . . . 123
8.2 Orthogonal Bases . . . . . . . . . . . . . . . . . . . . . . . . . 124
8.3 Biorthogonal Bases . . . . . . . . . . . . . . . . . . . . . . . . 133
8.4 Wavelets without Compact Support . . . . . . . . . . . . . . . 138
8.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Introduction
We here ask the reader to think of, for example, a sound signal as recorded
by a microphone. We do this, since for most people it seems helpful to have
a specific concrete application in mind, when trying to get acquainted with
unfamiliar mathematical concepts.
In this introduction, we will attempt a first approximative answer to
the following questions, making some comparisons with the classical Fourier
methods, and presenting a few basic examples. In subsequent chapters, we
intend to answer them in greater detail.
1
2 CHAPTER 1. INTRODUCTION
where ψj,k (t) = 2j/2 ψ(2j t − k) all are translations and dilations of the same
function ψ.
In general, the function
R ψ is more or less localized both in time and in
frequency, and with ψ(t) dt = 0: a cancellation/oscillation requirement.
If ψ is well localized in time it has to be less so in frequency, due to the
following inequality (coupled to Heisenberg’s uncertainty relation in quantum
mechanics).
Z Z 1/2 Z 1/2
(1.1) 2
|ψ(t)| dt ≤ 2 2
|tψ(t)| dt (2π) −1 b 2
|ω ψ(ω)| dω
from the frequency (octave) band Ij = {ω; 2j−1 + 2j−2 < |ω/π| < 2j + 2j−1}.
In the latter sum, each term is localized around t = 2−j k if ψ(t) is localized
around t = 0. The frequency content of ψj,k is localized around ω = 2j π if
the funtion ψRhas its frequency content mainly in a neighbourhood of ω = π.
(Recall that ψ(t) dt = 0 means ψ(0) b = 0.) For the Haar wavelet, shown
in Figure 1.1, the modulus of the Fourier transform is 4(sin ω/4)2/ω, ω > 0,
illustrating this.
In contrast, the harmonic constituents eiωt in the Fourier representation
have a sharp frequency ω, and no localization in time at all.
Thus, if hf, eiω· i is perturbed for some ω in the given frequency band Ij ,
then this will influence the behaviour at all times.
Conversely, if hf, ψj,k i is perturbed, then this will influence the behaviour
in the given frequency band Ij mainly, and in a neighbourhood of t = 2−j k
with a size comparable to 2−j mainly.
1.1. THE HAAR WAVELET AND APPROXIMATION 3
Exercises 1.0
1.1. Prove inequality (1.1), using the identity
where kf k is defined by
Z ∞ 1/2
2
kf k = |f (t)| dt
−∞
Now ϕ and ψ are orthogonal,R as well as {ϕ(t − k)}k , and {ψ(t − k)}k , in the
scalar product hf, gi := f (t)g(t) dt.
Let f (t) = t2 , 0 < t < 1, and f (t) = 0 elsewhere. We may approximate
f by its mean values over the dyadic intervals (k 2−j , (k + 1)2−j ). For j = 2
this is shown in Figure 1.2.
4 CHAPTER 1. INTRODUCTION
1.5 1.5
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
−1.5 −1.5
−1 0 1 2 −1 0 1 2
where the dilated scaling function is normalized to have its square integral
equal to 1.
0.8
0.6
0.4
0.2
−0.2
−0.2 0 0.2 0.4 0.6 0.8 1
express the approximating step function with its mean values over the dyadic
intervals with twice the length 2−(j−1) = 2−1 and, at the same time, record
the difference between the two approximating step functions. Note that the
difference will be expressed in terms of the Haar wavelet on the same doubled
scale in Figure 1.3.
1 1
0.8
0.6 0.5
0.4
0.2 0
0
−0.2
0 0.5 1 0 0.5 1
Figure 1.3: The first two successive means and differences at doubled scales.
0.8
0.6
0.4
0.2
−0.2
−0.4
0 0.5 1 1.5 2
0.8
0.6
0.4
0.2
−0.2
−0.4
−0.6
−0.8
−1
The function is then decomposed in the plot to the left in Figure 1.6 into
independent (orthogonal) parts, a Multi-Resolution Decomposition, each part
essentially (but not exactly) in a frequency octave. In the plot to the right,
the wavelet coefficients are shown correspondingly.
Multi−Resolution Decomposition
−2
−4
−4
−6
Dyad
−6
−8
−8
−10 −10
0 0.5 1 0 0.5 1
t t
The horizontal axes are time, and the lowest graph represents the up-
1.3. FOURIER VS WAVELET 7
permost half of the available frequency range, the next graph represents the
upper half of the remaining lower half of the frequency range etc.
Note that the sharp changes in the function are clearly visible in the
resolutions. A corresponding plot of the Fourier spectrum only reveals fre-
quency peaks. Using time windows and doing Fourier transforms for each
window will reveal the same features as the multiresolution decomposition,
but choosing the right window size requires additional information in general.
Moreover, the number of operations in the multiresolution algorithm is
linear in the number of samples of the signal, where the fast Fourier transform
has an additional logarithmic factor.
where the discrete Fourier sequence thus is (1/4, −1/4, 1/4, −1/4): the first
element, 1/4, being the mean of the original sequence. These are calculated
in the standard way: (xn is the original sequence with x2 = 1)
3
X
Xk = 1/4 xn e−2πikn/4 (k = 0, 1, 2, 3)
n=0
where ϕ(t) = 1 (0 < t < 1) and = 0 elsewhere. In Figure 1.7, the sample val-
ues thus are depicted to the right of their respective indices on the horizontal
t axis.
We will now show how the different frequency bands contribute to the
(different) functions.
8 CHAPTER 1. INTRODUCTION
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
0 1 2 3 4 0 1 2 3 4
Figure 1.7: The Fourier (left) and the Haar (right) representations.
The Fourier component with highest frequency is 1/4 cos πt. The Haar
wavelet component with highest frequency is obtained as follows.1
0
1 −1 0 0
0 = 0
0 0 1 −1 1 1
0
This measures the difference between the adjacent elements taken pairwise,
and the resulting sequence is encoded in the function
Gx(t) = 0 ψ(t/2) + 1 ψ(t/2 − 1)
which is a high-frequency part, and where
ψ(t) = 1 ϕ(2t) − 1 ϕ(2t − 1)
is the Haar wavelet. Here the two coefficients ±1 are the non-zero entries in
the filter matrix. These two components are shown in Figure 1.8, where the
localization is obvious in the Haar component, but it is not clear from the
Fourier component.
The corresponding means are calculated in the analogous way.
0
1 1 0 0
0 = 0
0 0 1 1 1 1
0
This is encoded in the function
Hx(t) = 0 ϕ(t/2) + 1/2 ϕ(t/2 − 1)
1
For notational simplicity, we have suppressed a normalizing factor 21/2 . See Exercise
1.2
1.3. FOURIER VS WAVELET 9
1 1
0.5 0.5
0 0
−0.5 −0.5
0 1 2 3 4 0 1 2 3 4
Figure 1.8: The Fourier (left) and the Haar (right) components.
0.8
0.6
0.4
0.2
−0.2
0 0.5 1 1.5 2 2.5 3 3.5 4
Denoting the filter matrices above with the same letters G, H, and their
respective adjoints G∗ , H ∗, it is easy to verify that (I is the the identity
matrix)
G∗ G + H ∗ H = 2I
GH ∗ = HG∗ = 0
HH ∗ = GG∗ = 2I
The first equation shows that we may recover the original sequence from
the sequences encoded in Gx and Hx, and the middle two express that the
10 CHAPTER 1. INTRODUCTION
functions Gx and Hx are orthogonal. (Note that the encoded sequences are
not orthogonal as such, but that this is an orthogonality relation between
the columns of G and the columns of H).
For the next highest frequency, the Fourier component is −1/2 cos πt/2.
(Since the sequence is real-valued, X1 is the complex conjugate to X−1 = X3 .)
The corresponding Haar wavelet component is calculated from the previous
level means:
0
1 −1 = −1
1
This measures the difference between (adjacent pairs of) means. The result-
ing sequence is encoded in the function (normalizing factor suppressed; see
Exercise 1.2 )
−1/4 ψ(t/4)
1 1
0.5 0.5
0 0
−0.5 −0.5
0 1 2 3 4 0 1 2 3 4
Figure 1.10: The next highest Fourier (left) and Haar (right) components.
The corresponding means are also calculated from the previous level
means:
0
1 1 = 1
1
ϕ(t/4) ≡ 1
which thus represents the mean value 1/4 (see Exercise 1.2) of the original
sequence in Figure 1.11.
1.3. FOURIER VS WAVELET 11
0.8
0.6
0.4
0.2
−0.2
0 0.5 1 1.5 2 2.5 3 3.5 4
Exercises 1.3
1.2. Verify that the correct values appear when choosing the normalized
functions in the encoding representations: for example, choosing the normal-
ized 21/2 ψ(2t − 1) instead of ψ(2t − 1). ’Normalized’ means that the integral
of the squared function equals 1 (the L2 norm is 1).
1.3. Compare the fast Fourier transform algorithm (FFT) and the above
Haar wavelet transform, with regard to how the non-locality and the locality
of the respective transform appear. This will already be apparent in the
four-point case.
1.4. Work out what happens if another segment is chosen, that is, the four
samples are (0, 0, 0, 1) and x0 = x1 = x2 = 0, x3 = 1. Compare the influence
on the Fourier and the wavelet representations.
12 CHAPTER 1. INTRODUCTION
1.6 Notes
Wavelet analysis has been used in signal/image processing practice for less
than two decades. Most of the mathematical ideas distinguishing wavelet
analysis from classical Fourier analysis are less than a century old.
1.6. NOTES 13
http://www.wavelet.org
2
These suggestions are not meant to be exhaustive.
Part I
Theory
15
Chapter 2
Signal Processing
In most cases our signals will be real valued, but for the sake of generality
we assume that they are complex valued. Mathematically, a signal is then a
2
P x 2: Z → C. Moreover, a signal x ∈ ℓ (Z) if it has finite energy, that
function
is, k |xk | < ∞. In the next chapter we will have more to say about the
space ℓ2 (Z) and discrete-time bases.
A filter H is an operator mapping an input signal x into an output signal
y = Hx, and is often illustrated by a block diagram as in Figure 2.1. A filter
H is linear if it satisfies the following two conditions for all input signals x
and y, and real numbers a
(2.1a) H(x + y) = Hx + Hy,
(2.1b) H(ax) = aHx.
17
18 CHAPTER 2. SIGNAL PROCESSING
x H y
A simple example of a linear filter is the delay operator D, which delays the
input signal x one step. A delay of n steps is written D n and is defined as
y = Dnx ⇔ yk = xk−n , for all k ∈ Z.
For a time-invariant (or shift-invariant) filter a delay in the input produces
a corresponding delay in the output; so that for all input signals x
H(Dx) = D(Hx).
That is, the operator H commutes with the delay operator, HD = DH.
From this it also follows that the filter is invariant to an arbitrary delay of n
steps, H(D n x) = D n (Hx).
Now, suppose H is a linear and time-invariant (LTI) filter. Let h be the
output, or response, when the input is a unit impulse
(
1 for k = 0,
δk =
0 otherwise,
that is, h = Hδ. The sequence (hk ) is called the impulse response of the
filter. Since the filter is time-invariant we have
D n h = H(D n δ).
So, if we write the input signal x of the filter as
X
x = · · · + x−1 D −1 δ + x0 δ + x1 Dδ + · · · = xn D n δ,
n
and use the fact that the filter is linear, we can write the output signal
y = Hx as
X X
y = H( xn D n δ) = xn H(D n δ)
n n
X
= xn D n h =: h ∗ x.
n
From this we conclude that a LTI filter is uniquely determined by its impulse
response, and that the output y always can be written as the convolution
between the input x and the impulse response h,
(2.3) y = Hx = h ∗ x.
This demonstrates how the properties of linearity and time invariance moti-
vates the definition of convolution. In the literature, it is common to restrict
the meaning of the word filter to an operator that is both linear and time-
invariant, and use the word operator in the more general case.
A finite impulse response (FIR) filter only has a finite number of coeffi-
cients different from zero. If a filter is not FIR, it is called infinite impulse
response (IIR).
A LTI filter is causal if it satisfies hk = 0 for k < 0. Non-causal filters are
often referred to as non-realizable, since they require the knowledge of future
values of the input signal. This is not necessarily a problem in applications,
where the signal values might already be stored on a physical medium such
as a CD-ROM. Further, FIR non-causal filters can always be delayed to make
them causal. Later, when we develop the theory of filter banks and wavelets,
it is convenient to work with non-causal filters.
The correlation x ⋆ y between two signals x and y is another sequence
defined by
X X
(2.4) (x ⋆ y)k = xn yn−k = xn+k yn .
n n
Exercises 2.1
2.1. Verify that the up- and downsampling operators are linear but not time-
invariant.
2.2. A filter is stable if all bounded input signals produce a bounded output
signal. A signal x is bounded if |xk | < C for all kPand for some constant C.
Prove that a LTI filter H is stable if and only if k |hk | < ∞.
Let us now see how different operations in the time domain are translated
into the z-domain. A delay n steps corresponds to a multiplication by z −n
(2.6) x ⊃ X(z) ⇔ D n x ⊃ z −n X(z).
We will use the notation x∗ to denote the time reverse of the signal x, x∗k =
x−k , and we have
(2.7) x ⊃ X(z) ⇔ x∗ ⊃ X(z −1 ).
The usefulness of the z-transform is largely contained in the convolution
theorem. It states that convolution in the time domain corresponds to a
simple multiplication in the z-domain.
Theorem 2.1. (The convolution theorem)
(2.8) y =h∗x ⇔ Y (z) = H(z)X(z).
The transform H(z) of the impulse response of the filter is called the
transfer function of the filter. This means that we can compute the output
of a LTI filter by a simple multiplication in the z-domain. Often this is
easier than directly computing the convolution. To invert the z-transform one
usually uses tables, partial fraction expansion, and theorems. The correlation
also has a corresponding relation on the transform side
(2.9) y = x1 ⋆ x2 ⇔ Y (z) = X1 (z)X2 (z −1 ).
Example 2.4. Let us again consider the averaging filter from Example 2.1
given by h0 = h1 = 1/2. If we now compute the output in the z-domain we
proceed as follows
1 1
H(z) = + z −1 ,
2 2
1 1
Y (z) = H(z)X(z) = X(z) + z −1 X(z)
2 2
1
⇒ yk = (xk + xk−1 ).
2
Which is the same result as we obtained in the time domain.
Exercises 2.2
2.3. Verify relations (2.6) and (2.7).
2.4. Prove that the correlation between between x and y can be written as
the convolution between x and the time reverse y ∗ of y, that is, x ∗ y = x ⋆ y ∗ .
Then prove relation (2.9).
22 CHAPTER 2. SIGNAL PROCESSING
It follows that X(ω) is 2π-periodic. Note that we, with an abuse of notation,
use the same letter X to denote both the Fourier and z-transform of a signal
x. From the context, and the different letters ω and z for the argument,
it should be clear what we refer to. To obtain the signal values from the
transform we use the inversion formula
Z π
1
(2.11) xk = X(ω)eiωk dω.
2π −π
The Parseval formula tells us that the Fourier transform conserves the energy
in the following sense
X Z π
2 1
(2.12) |xk | = |X(ω)|2 dω,
k
2π −π
or more generally
X Z π
1
(2.13) hx, yi = xk yk = X(ω)Y (ω) dω.
k
2π −π
From the definitions of the Fourier and z-transform, we see that we obtain the
Fourier transform X(ω) from the z-transform X(z) through the substitution
z = eiω . The convolution theorem for the z-transform therefore gives us a
corresponding theorem in the frequency domain
with a different amplitude and phase. Let us see why this is so. If the input
xk = eiωk , where |ω| ≤ π, the output y is
X X
yk = hn xk−n = hn eiω(k−n)
n n
X
iωk −iωn
=e hn e = eiωk H(ω).
n
Write the complex number H(ω) in polar form, H(ω) = |H(ω)| eiφ(ω) , and
we get
yk = |H(ω)| ei(ωk+φ(ω)) .
Thus, the output is also a pure frequency but with amplitude |H(ω)| and a
phase delay of −φ(ω). By plotting the magnitude response |H(ω)| and the
phase function φ(ω) for |ω| ≤ π, we see how the filter affects different fre-
quency components of the signal. This is the reason for using the word filter
in the first place; it filters out certain frequencies components of the input
signal. A filter with magnitude response constant equal to one, |H(ω)| = 1,
is therefore called an allpass filter – all frequency components of the input
signal are unaffected in magnitude (but not in phase).
Example 2.5. For the averaging (or lowpass) filter h0 = h1 = 1/2 we have
1
H(ω) = (1 + e−iω ) = e−iω/2 (eiω/2 + e−iω/2 )/2
2
= e−iω/2 cos(ω/2).
From this we see that the magnitude |H(ω)| = cos(ω/2) for |ω| < π. To
the left in Figure 2.2 the magnitude response is plotted. We see that high
frequencies, near ω = π, are multiplied by a factor close to zero and low
frequencies, near ω = 0, by a factor close to one. For the differencing (or
highpass) filter
1/2 for k = 0,
gk = −1/2 for k = 1,
0 otherwise,
we have G(ω) = (1 − e−iω )/2. The magnitude response is plotted to the right
in Figure 2.2. These two filters are the simplest possible examples of low-
and highpass filters, respectively.
24 CHAPTER 2. SIGNAL PROCESSING
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 1 2 3 0 1 2 3
Example 2.6. The ideal lowpass filter suppresses frequencies above the cut-
off at ω = π/2 completely, and frequencies below this cut-off passes through
unaffected. This filter is defined by the frequency response function
(
1, |ω| < π/2,
H(ω) =
0, π/2 < |ω| < π.
It follows from the inversion formula (2.11) that the filter coefficients are
samples of a sinc function
sin(πk/2)
(2.15) hk = 2 sinc(k/2) = 2 · .
πk/2
Exercises 2.3
2.5. First, show that the filter coefficients of the ideal lowpass filter are given
by (2.15). Then, compute the filter coefficients (gk ) of the ideal highpass filter
(
0, |ω| < π/2,
G(ω) =
1, π/2 < |ω| < π.
frequency components an equal amount. Later, we will also see how lin-
ear phase in a filter bank corresponds to symmetric (and non-orthogonal)
wavelets.
Example 2.7. From Example 2.5 we see that the phase of the lowpass filter
h0 = h1 = 1/2 is
The filter then has zero phase, φ(ω) = 0. A filter that is antisymmetric
around zero will have an imaginary, and odd, frequency response function
with phase π/2 or −π/2. If hk = −h−k we have h0 = 0 and
Note that the sign of the factor (h1 sin ω +h2 sin 2ω +· · · ) determines whether
the phase is π/2 or −π/2, and that this depends on the frequency ω. (−2i =
2e−iπ/2 .)
A causal filter can not have zero or constant phase. A causal filter can
on the other hand be symmetric or antisymmetric, but not around zero.
Causal filters have linear phase when they are symmetric or antisymmetric:
hk = hN −k or hk = −hN −k , respectively. Here we have assumed that we
have a FIR filter with nonzero coefficients h0 , h1 , . . . , hN . We will then have
a factor e−iN ω/2 in H(ω), and we see the linear term −Nω/2 in the phase.
Example 2.8. The FIR filter with nonzero coefficients h0 = h2 = 1/2 and
h1 = 1 is symmetric, and
1 1
H(ω) = + e−iω + e−i2ω
2 2
= e−iω (1 + cos ω).
The filter has linear phase, φ(ω) = −ω, since 1 + cos ω ≥ 0 for all ω.
26 CHAPTER 2. SIGNAL PROCESSING
H(z) = ±H(z −1 ).
Conclusion: for symmetric and antisymmetric filters the zeros of H(z) must
come in pairs as zi and zi−1 . When we construct wavelets later in this book,
we will see that symmetric wavelets corresponds to symmetric filters in a
filter bank.
The group delay of a filter is defined as
dφ
τ (ω) = − ,
dω
where φ(ω) is the phase of the filter. The group delay measures the delay at
the frequency ω.
Example 2.9. Suppose that the input x of the linear phase filter in Example
2.8 equals the sum of two pure frequencies, xk = eiω1 k + eiω2 k . Since φ(ω) =
−ω the group delay τ (ω) = 1 and the output y then equals
yk = H(ω1)eiω1 k + H(ω2)eiω2 k
= |H(ω1 )| e−iω1 eiω1 k + |H(ω2 )| e−iω2 eiω2 k
= |H(ω1 )| eiω1 (k−1) + |H(ω2 )| eiω2 (k−1) .
We see that the two oscillations are both delayed one step.
Exercises 2.4
2.6. Show that the frequency response of the highpass filter g0 = 1/2 and
g1 = −1/2 can be written as
Then compute and plot the magnitude and phase of G(ω). Note that the
factor sin(ω/2) is not positive for all |ω| < π.
2.5. VECTOR SPACES 27
(2.16) hx, yi = x1 y1 + · · · + xn yn ,
and the vectors are orthogonal if hx, yi = 0. The vectors ϕ(1) , . . . , ϕ(n) is a
basis of Rn (or Cn ) if every vector x ∈ Rn (or Cn ) can be written uniquely
as
x = a1 ϕ(1) + · · · + an ϕ(n) .
(2.18) hx, yi = a1 b1 + · · · + an bn .
Example 2.10. In R2 the natural basis is given by two vectors: δ (1) = (1, 0)t
and δ (2) = (0, 1)t . Another orthonormal basis of R2 is the natural basis
rotated 45 degrees counter-clockwise and has the basis vectors
(1) 1 1 (2) 1 −1
ϕ =√ , and ϕ = √ .
2 1 2 1
28 CHAPTER 2. SIGNAL PROCESSING
If we take the inner product of both sides of this equation with the basis
vector ϕ(j) we obtain an expression for the coordinate aj ,
n
X
aj = e e(k) , ϕ(j) i
ak hϕ ⇔ a = Pe
a.
k=1
The equivalence follows from the definition of matrix multiplication, and the
matrix P has elements Pjk = hϕ e(k) , ϕ(j) i. The matrix P is orthogonal since
P P t = P t P = I. For a vector x ∈ Cn the above also holds true, but the
matrix P is hermitian P P ∗ = P ∗ P = I. Here P ∗ is the adjoint, or the
∗
complex conjugate transpose, of P : Pjk = P kj . Note that for a hermitian,
or orthogonal, matrix we have
√
To make this transformation orthogonal we should replace 1/n by 1/ n, but
the convention is to define it as we have. We obtain x from X through the
inversion formula
n
X
xk = Xj W jk .
j=0
Z2 := {(kx ky )t : kx , ky ∈ Z}.
that is, h = Hδ. Just as for signals we can now write the output g = Hf as
a convolution (in two dimensions)
X
g= fn S n h =: h ∗ f.
n∈Z2
2.7 Sampling
Here we will briefly explain the mathematical basis of exchanging a function
for its sample values at, say, the integers. The fundamental result is Poisson’s
summation formula.
For a continuous-time signal f (t) we define its Fourier transform as
Z ∞
b
f(ω) = f (t)e−iωt dt,
−∞
Now, if the function f has been lowpass filtered with cut-off frequency π,
b
that is, f(ω) = 0 for ω ≥ π, then holds in the pass-band |ω| < π
X
b
f(ω) = f (l)e−ilω
l
Here we have complete information about the Fourier transform fb(ω) in terms
of the sample values f (l). Thus, applying the inverse Fourier transform, we
get a formula, the Sampling Theorem, reconstructing the function from its
sample values.
Z πX
f (t) = 1/(2π) f (l)e−ilω eiωt dω
−π l
X
= f (l) sin π(t − l)/(π(t − l))
l
Exercises 2.7
2.7. Consider the function f (t) = sin πt. What happens when this is sam-
pled at the integers? Relate this to the conditions in the Sampling Theorem.
2.8. Prove Poisson’s summation formula (2.19), noting that the left-hand
side has period 2π, and that the right-hand side is a Fourier series with the
same period.
2.9. Work out Poisson’s summation formula (2.19) when the function f (t)
is sampled at the points t = 2−J k. What should be the maximum cut-off
frequency in this case? (ω = 2J π)
32 CHAPTER 2. SIGNAL PROCESSING
Chapter 3
Filter Banks
The study of filter banks in signal processing was one of the paths that led
to wavelets. In signal processing one has for a long time developed differ-
ent systems of basis functions to represent signals. In many applications it
is desirable to have basis functions that are well localized in time and in
frequency. For computational purposes these functions should also have a
simple structure and allow for fast computations. Wavelets satisfy all these
criteria.
The fast computation and representation of functions in wavelet bases is
intimately connected to filter banks. As a matter of fact, the so-called fast
wavelet transform is performed as a repeated application of the low- and
highpass filters in a filter bank. Filter banks operate in discrete time and
wavelets in continuous time. Wavelets are discussed in the next chapter.
for some numbers (cn ). Here we encounter a problem not present in finite
dimensions. The series expansion involves an infinite number of terms, and
this series must converge. It turns out that we can not find a basis for all
signals, and we will here restrict ourselves to those with finite energy. Then,
we can carry over all the concepts from finite-dimensional vector spaces in a
fairly straightforward way.
33
34 CHAPTER 3. FILTER BANKS
Accordingly, the equality in (3.1) means that the sequence of partial sums
N
X
(N )
s = cn ϕ(n) ,
n=−N
is convergent with limit x, that is,
s(N ) − x
→ 0, as N → ∞.
is also complete. In this chapter we will assume that all signals are contained
in this space.
3.1. DISCRETE-TIME BASES 35
The basis functions (ϕ(n) ) are now formed as the even translates of these two
prototypes (see Figure 3.1),
(2n) (2n+1)
ϕk = ϕk−2n , and, ϕk = ψk−2n .
1
yn(0) := c2n = hx, ϕ(2n) i = √ (x2n + x2n+1 ),
2
(3.4)
1
yn(1) := c2n+1 = hx, ϕ(2n+1) i = √ (x2n − x2n+1 ),
2
in other words, weighted averages and differences of pairwise values of x.
Another way of interpreting this is to say that we take pairwise values of x,
and then rotate the coordinate system in the plane (R2 ) 45 degrees counter-
clockwise. Here, we have also introduced the sequences y (0) and y (1) con-
sisting of the even- and odd-indexed coordinates, respectively. The basis
functions is an orthonormal basis of ℓ2 (Z), and we can therefore reconstruct
3.2. THE DISCRETE-TIME HAAR BASIS 37
(0) (2n)
ϕk ϕk
............................................................................ -
0 1 2n 2n + 1 k
(1) (2n+1)
ϕk ϕk
............................................................................ -
0 1 2n 2n + 1 k
Analysis
We will now show how we can use a filter bank to compute the coordinates
in (3.4). If we define the impulse responses of two filters H and G as
hk = ϕk , and, gk = ψk ,
and if we let h∗ and g ∗ denote the time-reverses of these filters, we can write
the inner products in (3.4) as a convolution,
X (2n)
X X
yn(0) = xk ϕk = xk hk−2n = xk h∗2n−k
k k k
∗
= (x ∗ h )2n .
(1)
Similarly, we get yn = (x ∗ g ∗ )2n .
The conclusion is that we can compute y (0) and y (1) by filtering x with H ∗
and G∗ , respectively, and then downsampling the output of these two filters
(see Figure 3.2). Downsampling removes all odd-indexed values of a signal,
and we define the downsampling operator (↓ 2) as
(↓ 2)x = (. . . , x−2 , x0 , x2 , . . . ).
38 CHAPTER 3. FILTER BANKS
Thus, we have
Here, H ∗ and G∗ denote the low- and highpass filters with impulse responses
h∗ and g ∗, respectively. These two filters are non-causal since their impulse
responses are the time-reverses of causal filters. This is not necessarily a
problem in applications, where the filters can always be made causal by
delaying them a certain number of steps; the output is then delayed an equal
number of steps.
H∗ ↓2 y (0)
x
G∗ ↓2 y (1)
Synthesis
So far we have seen how we can analyze a signal, or compute its coordinates,
in the Haar basis using a filter bank. Let us now demonstrate how we can
synthesize, or reconstruct, a signal from the knowledge of its coordinates.
From the definition of the filters H and G, and from the reconstruction
formula (3.5) we get
X X
xk = yn(0) ϕk−2n + yn(1) ψk−2n
n n
X X
= yn(0) hk−2n + yn(1) gk−2n
n n
(0) (1)
= (v ∗ h)k + (v ∗ g)k ,
where v (0) = (↑ 2)y (0) and v (1) = (↑ 2)y (1) , see Exercise 3.1. Here the
upsampling operator (↑ 2) is defined as
x = v (0) ∗ h + v (1) ∗ g
= H (↑ 2)y (0) + G (↑ 2)y (1) =: x(0) + x(1) .
3.3. THE SUBSAMPLING OPERATORS 39
The signals x(0) and x(1) are obtained by first upsampling y (0) and y (1) , and
then filtering the result with H and G, respectively; see Figure 3.3. On the
other hand, if we look back at the reconstruction formula (3.5), we can write
x as
x = x(0) + x(1)
X X
= hx, ϕ(2n) iϕ(2n) + hx, ϕ(2n+1) iϕ(2n+1) ,
n n
which means that x(0) and x(1) are the orthogonal projections of x onto the
subspaces spanned by the even and odd basis functions, respectively.
x(0)
y (0) ↓2 H
+ x
y (1) ↓2 G x(1)
Exercises 3.2
P
3.1. Show that (v ∗ h)k = n yn hk−2n , where v = (↑ 2)y. Observe that
v2n = yn .
Downsampling
The downsampling operator (↓ 2) removes all odd-indexed values of a signal,
and is consequently defined as
(↓ 2)x = (. . . , x−2 , x0 , x2 , . . . ).
If we let y = (↓ 2)x we have yk = x2k , and in the z-domain we get (see
Exercise 3.2)
1
(3.6) Y (z) = X z 1/2 + X −z 1/2 .
2
40 CHAPTER 3. FILTER BANKS
Upsampling
The upsampling operator (↑ 2) inserts a zero between every value of a signal,
U(ω) = Y (2ω).
Exercises 3.3
3.2. Prove relations (3.6) and (3.7).
3.3. Show that if Y (z) = X z 1/2 then Y (ω) = X(ω/2), and if U(z) =
Y (z 2 ) then U(ω) = Y (2ω).
3.4. Assume that a filter H has the transfer function H(z). Show that the
time-reverse H ∗ of the filter has the transfer function H ∗(z) = H (z −1 ).
y (0) x(0)
He∗ ↓2 ↓2 H
x + b
x
e∗
G ↓2 ↓2 G x(1)
y (1)
Reconstruction Conditions
The results of the previous section for the down- and upsampling give us the
following expressions for the z-transform of x(0) and x(1)
1 h i
X (0) (z) = H(z) X(z)H e ∗ (z) + X(−z)H
e ∗ (−z) ,
2
1 h i
X (1) (z) = G(z) X(z)G e∗ (z) + X(−z)G e∗ (−z) .
2
b
Adding these together, we obtain an expression for the z-transform of x
b 1h e ∗ (z) + G(z)G
i
e∗ (z) X(z)
X(z) = H(z)H
2
1h e ∗ (−z) + G(z)G
i
e∗ (−z) X(−z),
+ H(z)H
2
where we have grouped the terms with the factors X(z) and X(−z) together,
respectively.
b, if the
From this we see that we get perfect reconstruction, that is x = x
factor in front of X(z) equals one, and the factor in front of X(−z) equals
zero. We thus get the following two conditions on the filters
The first condition ensures that there is no distortion of the signal, and the
second condition that the alias component X(−z) is cancelled. These two
conditions will appear again when we study wavelets. This is the key to the
connection between filter banks and wavelets. Due to different normaliza-
tions, the right-hand side in the no distortion condition equals 1 for wavelets
though.
(3.10) e ∗ (−z),
G(z) = −z −L H e
and G(z) = −z −L H ∗ (−z),
where L is an arbitrary odd integer. In Exercise 3.6 below you are to verify
that this choice cancels the alias, and then you will also see why L has
to be odd. In this book we will in most cases choose L = 1. Choosing the
3.4. PERFECT RECONSTRUCTION 43
e ∗ (z) + H(−z)H
H(z)H e ∗ (−z) = 2.
From this we conclude that all even powers in P (z) must be zero, except the
constant term which should equal one. The odd powers all cancel and are
the design variables in a filter bank.
The design of a perfect reconstruction filter bank is then a question of
finding a product filter P (z) satisfying condition (3.11). Once such a product
filter has been found, it is factored in some way as P (z) = H(z)H e ∗ (z). The
highpass filters are then given by equation (3.10).
Example 3.1. Let us see if the discrete-time Haar basis satisfies the perfect
reconstruction condition. The filters are given by
e 1 e 1
H(z) = H(z) = √ (1 + z −1 ), G(z) = G(z) = √ (1 − z −1 ).
2 2
e ∗ (z) = H(z)H(z
P (z) = H(z)H e −1 )
1
= (z + 2 + z −1 ).
2
This product filter indeed satisfies the perfect reconstruction condition, since
all even powers equal zero except the constant term p0 = 1.
44 CHAPTER 3. FILTER BANKS
Biorthogonal Bases
Recall that an orthogonal filter bank could be seen as a realization of the
expansion of a signal into a special type of discrete-time basis. This basis was
formed by the even translates of two basis function ϕ and ψ, where ϕk = hk
and ψk = gk . And we had
X X
x= hx, ϕ(2n) iϕ(2n) + hx, ϕ(2n+1) iϕ(2n+1) ,
n n
(2n) (2n+1)
where ϕk = ϕk−2n and ϕk = ψk−2n . Now, a biorthogonal filter bank
corresponds to the biorthogonal expansion
X X
x= e(2n) iϕ(2n) +
hx, ϕ e(2n+1) iϕ(2n+1) ,
hx, ϕ
n n
= ψek−2n ; ϕ
ek = e
hk and ψek = gek .
(2n) (2n+1)
ek
Here, ϕ ek−2n and ϕ
=ϕ ek
3.5. DESIGN OF FILTER BANKS 45
Exercises 3.4
3.6. Verify that the alias cancellation choice (3.10) of the highpass filters
implies that condition (3.9) is satisfied.
3.7. There exists a filter bank that is even simpler than the one based on the
Haar basis – the lazy filter bank. It is orthogonal and given by the lowpass
filter H(z) = z −1 . What are the corresponding highpass filters? What is the
product filter, and does it satisfy the perfect reconstruction condition?
The signals y (0) and y (1) are then the odd- and even-indexed values of x,
respectively. Verify this! What are the signals x(0) and x(1) ? and is their
sum equal to x?
Factorization
Let us first assume that we want to construct an orthogonal filter bank using
the symmetric Daubechies product filter. Then, since P (z) = H(z)H(z −1 ),
we know that the zeros of P (z) always come in pairs as zk and zk−1 . When
we factor P (z) we can, for each zero zk , let either (z − zk ) or (z − zk−1 ) be
a factor of H(z). If we always choose the zero that is inside or on the unit
circle, |zk | ≤ 1, then H(z) is called the minimum phase factor of P (z).
Now, suppose we also want the filter H(z) to be symmetric. Then the
zeros of H(z) must come together as zk and zk−1 . But this contradicts the
orthogonality condition except for the Haar basis, where both zeros are at
z = −1. Thus, orthogonal filter banks can not have symmetric filters. 1
In a biorthogonal basis, or filter bank, we factor the product filter as
P (z) = H(z)H e ∗ (z). There are several ways of doing so, and we then obtain
several different filter banks for a given product filter. In most cases, we
want the filters H(z) and H e ∗ (z) to be symmetric, unless we are designing an
orthogonal filter bank, that is.
Finally, since we want both H(z) and H e ∗ (z) to have real coefficients, we
always let the complex conjugate zeros zk and zk belong to either H(z) or
He ∗ (z).
Let us again illustrate this with an example.
Example 3.4. For N = 2 the Daubechies product filter was given by
1
P (z) = (−z 3 + 9z + 16 + 9z −1 − z −3 )
16
2 2
1+z 1 + z −1
= (−z + 4 − z −1 ).
2 2
1
We have assumed that all filters are FIR. A filter bank with IIR symmetric filters can
be orthogonal.
48 CHAPTER 3. FILTER BANKS
√
This polynomial
√ has four zeros at z = −1, one at z = 2 − 3, and one at
z = 2 + 3. Two possible factorizations of this product filter are:
3.6 Notes
There are two books that we direct the reader to, where the wavelet theory
is started from the study of filter banks. These are Wavelets and Filter
Banks by Nguyen and Strang [27], and Wavelets and Subband Coding by
Kovacevic and Vetterli [30]. These books are appropriate for an engineer or
an undergraduate student with a background in signal processing.
Chapter 4
Multiresolution Analysis
49
50 CHAPTER 4. MULTIRESOLUTION ANALYSIS
Definition 4.1. We define L2 (R) as the set of all functions f (t) such that
Z ∞
|f (t)|2 dt < ∞.
−∞
1. kf k ≥ 0 and kf k = 0 ⇒ f = 0,
2. kcf k = |c| kf k , for c ∈ C,
3. kf + gk ≤ kf k + kgk . (The triangle inequality)
Remark. The space L2 (R) endowed with the scalar product is complete: if
we have a Cauchy sequence, kfn − fm k → 0 as m, n → ∞, then this sequence
converges in L2 (R) to a limit f : kfn − f k → 0 as n → ∞. A normed vector
space with a scalar product, which is also complete, is termed a Hilbert space.
Thus, L2 (R) is a Hilbert space, as well as the spaces Rn , Cn , and ℓ2 (Z).
4.1. PROJECTIONS AND BASES IN L2 (R) 51
The space L2 (R) contains all physically realizable signals. It is also the
natural setting for the continuous-time Fourier transform:
Z ∞
b
F f (ω) = f (ω) = f (t)e−iωt dt.
−∞
b
or kF f k = (2π)1/2 kf k. The quantity |f(ω)| 2
/(2π) can then be interpreted
as the energy density at frequency ω. Integrating this energy density over
all frequencies gives the total energy of the signal, according to Parseval’s
formula. We finally remark that Parseval’s formula is a special case of the
seemingly more general Plancherel’s formula
Z ∞ Z ∞
b g(ω) dω = 2π
f (ω)b f (t)g(t) dt,
−∞ −∞
kf − wk ≤ kf − vk , for all v ∈ V.
V⊥
W
PV⊥ f f
* w f
*
6
PV f − f
-?
-
V V
PV f v
Riesz Bases
The notion of a basis in linear spaces extends from finite dimensions and
ℓ2 (Z) to L2 (R). We say that a collection {ϕk }k∈Z of functions is a basis for
a linear subspace V if any function f ∈ V can be written uniquely as
X
(4.3) f= ck ϕ k .
k
4.1. PROJECTIONS AND BASES IN L2 (R) 53
We also say that V is spanned by the functions ϕk . The sum (4.3) should
be interpreted as the limit of finite sums when the number of terms goes to
infinity. More precisely, kf − sK k → 0 as K → ∞, where sK is the finite
sum 1
K
X
sK = ck ϕ k .
k=−K
In other words, the energy of f − sK goes to zero when more and more terms
in the sum are added. A fundamental fact about the scalar product that we
will use throughout the book is:
X X
(4.4) h ck ϕk , gi = ck hϕk , gi.
k k
Small errors in the signal will then give small errors in the coefficients and
vice versa, provided that A−1 and B are of moderate size. A perhaps more
relevant result involves relative errors,
r r
kf − fek B kc − eck kc − e
ck e
B kf − fk
≤ and ≤ .
kf k A kck kck A kf k
1
Strictly speaking, we should have used two indices K1 and K2 that independently go
to infinity.
54 CHAPTER 4. MULTIRESOLUTION ANALYSIS
p
Here, kck is the ℓ2 (Z)-norm. The number B/A has a name: the condition
number . It gives an upper bound on how much relative errors can grow when
passing between f and its coefficients (ck ). Since we always have A ≤ B, the
condition number must be at least one. The optimal case occurs when the
condition number is 1, A = B = 1. We then have an orthonormal (ON) basis.
For ON-bases we have hϕk , ϕl i = δk,l , where we have used the Kronecker delta
symbol
(
1 if k = l,
δk,l =
0 otherwise.
Taking scalar products with ϕl in (4.3) gives cl = hf, ϕl i and thus every
f ∈ V can be written as
X
(4.5) f= hf, ϕk iϕk .
k
For the orthogonal projection PV onto V , it is easy to show that, for any
f ∈ L2 (R), hPV f, ϕk i = hf, ϕk i. We then have
X
PV f = hf, ϕk iϕk .
k
ek }. But if we
In general, there is no unique choice of the dual basis {ϕ
require that the linear space spanned by the dual basis equals V , there is
just one such basis. This is, for instance, the case when V = L2 (R).
4.2. SCALING FUNCTIONS AND APPROXIMATION 55
Exercises 4.1
4.1. Show that the set of band-limited functions is a subspace of L2 (R). Also
show that it is closed (quite difficult).
4.2. Verify that the projection operator in Example 4.1 is given by (4.2).
Hint: Show that
1 b d
0 = hf − PV f, vi = h f − PV f , b
vi,
2π
for each band-limited v.
4.3. Show that if {ϕk } is an orthonormal basis for V , and PV is the orthog-
onal projection onto V , then holds, for any f ,
hPV f, ϕk i = hf, ϕk i.
4.4. Prove (4.4). Hint: Let f be the sum in (4.3) and let sK be the finite
sum. First show that
K
X
hsK , gi = ck hϕk , gi.
k=−K
= |hf − sK , gi|
≤ kf − sK k kgk → 0 as K → ∞.
2 2
1.5 1.5
1 1
0.5 0.5
0 0
0 1 2 3 0 1 2 3
f1 (t) ϕ(2t − 1) ϕ(2t − 4)
(4.6) is denoted V1 . The coefficients (s1,k ) may be chosen as the mean values
of f over the intervals (k/2, (k + 1)/2),
Z (k+1)/2 Z ∞
s1,k = 2 f (t) dt = 2 f (t)ϕ(2t − k) dt.
k/2 −∞
If the coefficients (s0,k ) are choosen as mean values over the intervals
4.2. SCALING FUNCTIONS AND APPROXIMATION 57
2 2
1.5 1.5
1 1
0.5 0.5
0 0
0 1 2 3 0 1 2 3
f0 (t) ϕ(t − 1)
2 2
1.5 1.5
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
Figure 4.4: The hat scaling function and a function in the corresponding V0
space.
The first condition just states that functions in Vj+1 contain more details
than functions in Vj : in a certain sense, we add information when we ap-
proximate a function at a finer scale. The second condition says that Vj+1
approximates functions at twice a finer scale than Vj , and also gives a connec-
tion between the spaces Vj . The fifth conditions requires the approximation
spaces to be spanned by scaling functions. Let us introduce the dilated,
translated, and normalized scaling functions
After a scaling by 2j , it is easy to see that for fixed j, the scaling functions
ϕj,k constitute a Riesz basis for Vj . Thus, every fj ∈ Vj can be written as
X
(4.8) fj (t) = sj,k ϕj,k (t).
k
4.2. SCALING FUNCTIONS AND APPROXIMATION 59
The reason for the factors 2j/2 is that all scaling functions have equal norms,
kϕj,k k = kϕk.
The remaining two conditions are of a more technical nature. They are
needed to ensure that the wavelets, which we will introduce soon, give a Riesz
basis for L2 (R). The third basically says that any function can be approx-
imated arbitrarily well with a function fj ∈ Vj , if we just choose the scale
fine enough. This is what is meant by density. Finally, the fourth condition
says, loosely speaking, that the only function that can be approximated at
an arbitrarily coarse scale is the zero function.
for some coefficients (hk ). This is the scaling equation . Taking the Fourier
transform of the scaling equation gives us (Exercise 4.10)
(4.11) b
ϕ(ω) b
= H(ω/2)ϕ(ω/2),
where
X
H(ω) = hk e−ikω .
k
60 CHAPTER 4. MULTIRESOLUTION ANALYSIS
P
b = 1, we get H(0) = hk = 1, and we see that
Letting ω = 0 and using ϕ(0)
the coefficients (hk ) may be interpreted as an averaging filter. In fact, as we
will later see, also H(π) = 0 holds, and thus H is a lowpass filter. It can
be shown that the scaling function is uniquely defined by this filter together
with the normalization (4.9). As a matter of fact, repeating (4.11) and again
b = 1 yields (under certain conditions) the infinite product formula
using ϕ(0)
Y
(4.12) b
ϕ(ω) = H(ω/2j ).
j>0
The properties of the scaling function are reflected in the filter coefficients
(hk ), and scaling functions are usually constructed by designing suitable fil-
ters.
Example 4.2. The B-spline of order N is defined by the convolution
SN (t) = χ(t) ∗ . . . ∗ χ(t), (N factors)
where χ(t) denotes the Haar scaling function. The B-spline is a scaling
function (Exercise 4.7) for each N. The translated functions SN (t − k) gives
N −2 times continuously differentiable, piecewise polynomial approximations
of degree N − 1. The cases N = 1 and N = 2 corresponds to the Haar and
the hat scaling functions, respectively. When N grows larger, the scaling
functions become more and more regular, but also more and more spread
out. We will describe wavelets based on B-spline scaling functions in Chapter
8.
Example 4.3. The sinc function
sin πt
ϕ(t) = sinc t = ,
πt
is another scaling function (Exercise 4.8). It does not have compact support,
and the decay as |t| → ∞ is very slow. Therefore, is it not used in practice.
It has interesting theoretical properties though.
It is in a sense dual to the Haar scaling function, since its Fourier trans-
form is given by the box function
(
1 if − π < ω < π,
b
ϕ(ω) =
0 otherwise.
It follows that every function in V0 is band-limited with cut-off frequency
π. In fact, the Sampling Theorem (cf. Section 2.7) states that every such
band-limited function f can be reconstructed from its sample values f (k) via
X
f (t) = f (k) sinc(t − k).
k
4.3. WAVELETS AND DETAIL SPACES 61
Exercises 4.2
4.5. Show that f (t) ∈ Vj ⇔ f (2−j t) ∈ V0 .
4.6. Verify that {ϕj,k }k∈Z is a Riesz basis for Vj for fixed j, given that
{ϕ0,k }k∈Z is a Riesz basis for V0 .
4.7. Derive the scaling equation for the spline scaling functions. Hint: Work
in the Fourier domain and show that ϕ(ω) = H(ω/2)ϕ(ω/2), where H(ω) is
2π-periodic. Calculate the coefficients (hk ).
4.8. Derive the scaling equation for the sinc scaling function. Hint: Work
as in the previous problem.
4.9. Show that the scaling equation can be written more generally as
√ X
ϕj,k = 2 hl ϕj+1,l+2k .
l
b = 1?
4.11. Verify (4.12). Why do we need ϕ(0)
d0 = f1 − f0 .
62 CHAPTER 4. MULTIRESOLUTION ANALYSIS
1
(4.13) s0,k = (s1,2k + s1,2k+1 ).
2
0.5 1.5
0.5
0 0
−0.5
−1
−0.5 −1.5
0 1 2 3 0 1 2 3
d0 (t) ψ(t − 1)
Figure 4.5: The Haar wavelet and a function in the corresponding W0 space
1
(4.14) w0,k = (s1,2k − s1,2k+1 ).
2
s1,2k
s0,k
s1,2k+1 w0,k
- - -
k k+1/2 k+1 k k+1/2 k+1 k k+1/2 k+1
−w0,k
Example 4.5. For the sinc scaling function from Example 4.3, we choose
the wavelet as
(
b 1 if π < |ω| < 2π,
ψ(ω) =
0 otherwise.
It is easy to see that ψ(t) = sinc 2t − sinc t (Exercise 4.14). This is the
sinc wavelet. The space W0 will be the set of functions that are band-
limited to the frequency band π < |ω| < 2π. More generally, the space
Wj will contain all functions that are band-limited to the frequency band
2j π < |ω| < 2j+1 π.
64 CHAPTER 4. MULTIRESOLUTION ANALYSIS
1.5 2
1.5
1
1
0.5 0.5
0
0
−0.5
−0.5 −1
−2 −1 0 1 2 −2 −1 0 1 2
Figure 4.7: The hat scaling function and a piecewise linear wavelet spanning
W0 .
for some coefficients (gk ). This is the wavelet equation . A Fourier transform
gives
(4.17) b
ψ(ω) b
= G(ω/2)ϕ(ω/2),
where
X
G(ω) = gk e−ikω .
b P
b
Using ϕ(0) = 1 and ψ(0) = 0, we get that G(0) = gk = 0. Thus the
coefficients (gk ) can be interpreted as a difference filter. Later we will see
that also G(π) = 1 holds, and G is in fact a highpass filter. The wavelet and
all its properties are determined by this filter, given the scaling function.
4.3. WAVELETS AND DETAIL SPACES 65
The detail spaces Wj are defined as the the set of functions of the form
X
(4.18) dj (t) = wj,k ψj,k (t).
k
Using the fourth condition in the definition of MRA one can show that
fj0 goes to 0 in L2 when j0 → −∞. The third condition now implies that,
choosing J larger and larger, we can approximate a function f with approx-
imations fJ that become closer and closer to f . Letting J → ∞ therefore
gives us the wavelet decomposition of f
X
(4.19) f (t) = wj,k ψj,k (t).
j,k
We have thus indicated how to prove that {ψj,k } is a basis for L2 (R). How-
ever, it still remains to construct the highpass filter G determining the mother
wavelet ψ.
The decomposition Vj+1 = Vj ⊕ Wj above is not unique. There are many
ways to choose the wavelet ψ and the corresponding detail spaces Wj . Each
such choice corresponds to a choice of the highpass filter G. In the next
section, we will describe a special choice, which gives us an orthogonal system
66 CHAPTER 4. MULTIRESOLUTION ANALYSIS
Exercises 4.3
4.12. Verify that the wavelet in Example 4.4 will span the difference between
V1 and V0 .
4.13. Verify that, for j 6= 0, each function fj+1 ∈ Vj+1 can be written as
fj+1 = fj + dj where dj ∈ Wj . (For j = 0 the statement is true by definition.)
4.14. Verify that ψ(t) = sinc 2t − sinc t, when ψ is the sinc wavelet in Ex-
ample 4.3. Hint: Work in the Fourier domain.
Orthogonality Conditions
The first requirement is that the scaling functions ϕ(t − k) constitute an
orthogonal basis for V0 , that is,
Z ∞
ϕ(t − k)ϕ(t − l) dt = δk,l .
−∞
Using the scaling equation (4.10) we can transform this to a condition on the
coefficients (hk ) (see Exercise 4.15):
X
(4.20) hl hl+2k = δk /2.
l
Wj
dj *
fj+1
6
- Vj
fj
Figure 4.9: Vj ⊥ Wj
For the filter coefficients this means that (see Exercise 4.17)
X
(4.22) hm+2k gm+2l = 0.
m
In terms of the filter functions H(ω) and G(ω), (4.20) - (4.22) becomes
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
−2 0 2 4 6 8 −4 −2 0 2 4 6
b
3. ϕ(2ω) b
= H(ω)ϕ(ω), for some 2π- periodic function H.
X 2
1. b j
ψ(2 ω) = 1,
j
X
2. ψ(2 b j (ω + 2kπ)) = 0,
b j ω)ψ(2 for all odd integers k,
j≥0
X X 2
3. b j
ψ(2 (ω + 2kπ)) = 1.
j≥1 k
The first two conditions are equivalent to the orthonormal basis property
of ψj,k in L2 (R), and the third then relates to a corresponding scaling func-
tion. Broadly speaking, the first condition means that linear combinations
of ψj,k are dense in L2 (R), and the second then relates to the orthogonality.
Exercises 4.4
4.15. Verify that the condition on the scaling functions ϕ(t − k) to be or-
thogonal implies (4.20). Hint: First, you may assume l = 0 (why?). Then,
4.5. THE DISCRETE WAVELET TRANSFORM 71
show that
Z ∞
δk = ϕ(t)ϕ(t − k) dt
−∞
Z ! !
∞ X X
= 2 hl ϕ(2t − l) 2 hm ϕ(2t − 2k − m) dt
−∞ l m
XZ ∞
=4 hl hm ϕ(2t − l)ϕ(2t − 2k − m) dt
l,m −∞
XZ ∞
=2 hl hm ϕ(t − l)ϕ(t − 2k − m) dt
l,m −∞
X
=2 hl hl+2k .
l
4.17. Show that the condition on the scaling functions ϕ(t − k) to be or-
thogonal to the wavelets ψ(t − k) implies (4.22).
J−1 X
X X
fJ (t) = wj,k ψj,k (t) + sj0 ,k ϕj0 ,k (t).
j=j0 k k
The computation of the coefficients is done with filter banks. For the sake
of simplicity we derive this connection between MRA’s and filter banks in
the orthogonal case. The biorthogonal case is entirely similar, the notation
just becomes a bit more cumbersome.
We assume that we know the scaling coefficients sJ,k = hf, ϕJ,k i of a
function f at a certain finest scale J. In practice, we usually only have sample
values f (2−J k) available, and we have to compute the scaling coefficients
numerically from these sample values. This is known as pre-filtering. It is
common practice to simply replace the scaling coefficients with the sample
values. The effect of doing so, and other aspects of pre-filtering, will be
treated in Chapter 15.
72 CHAPTER 4. MULTIRESOLUTION ANALYSIS
where sj,k = hf, ϕj,k i and wj,k = hf, ψj,k i. A scalar multiplication on both
sides with ϕj,l together with the orthogonality conditions for scaling functions
and wavelets gives us (Exercise 4.18)
X
sj,k = sj+1,l hϕj+1,l , ϕj,k i
l
we obtain
√ X √
hϕj+1,l, ϕj,k i = 2 hm hϕj+1,l , ϕj+1,m+2k i = 2 hl−2k .
m
With a similar calculation for the wavelet coefficients, we have derived the
formulas
√ X √ X
(4.25) sj,k = 2 hl−2k sj+1,l and wj,k = 2 gl−2k sj+1,l .
l l
Scaling and wavelet coefficients at the coarser scale are thus computed by
sending the scaling coefficients at the finer scale through the√analysis part √ of
the orthogonal filter bank with lowpass and highpass filter 2 H and 2 G.
Repeating this recursively, starting with the coefficients (sJ,k ), gives the
wavelet coefficients (wj,k ) for j = j0 , . . . , J − 1, and the scaling coefficients
(sj0 ,k ) at the coarsest scale. This recursive scheme is called the Fast Forward
Wavelet Transform. If we start with N scaling coefficients at the finest scale,
the computational effort is roughly 4MN operations, where M is the filter
length. Compare this with the FFT algorithm, where the computational
effort is 2N log N operations.
4.5. THE DISCRETE WAVELET TRANSFORM 73
H∗ ↓2 sJ−2
H∗ ↓2
sJ G∗ ↓2 wJ−2
G∗ ↓2 wJ−1
From sj0 and wj0 , we can thus reconstruct sj0 +1 , which together with wj0 +1
gives us sj0 +2 and so on, until we finally arrive at the sJ . This is the Fast
Inverse Wavelet Transform, see Figure 4.12. Note that we use neither the
scaling function nor the wavelet explicitly in the forward or inverse wavelet
transform, only the orthogonal filter bank.
To recover the sample values f (2−J k) from the fine-scale coefficients (sJ,k )
we need to do a post-filtering step. This will be be discussed in Chapter 15.
Exercises 4.5
4.18. Verify all the steps in the derivation of the filter equations (4.25).
74 CHAPTER 4. MULTIRESOLUTION ANALYSIS
sj 0 ↑2 H
+ ↑2 H
wj 0 ↑2 G + sj0 +2
wj0 +1 ↑2 G
For the dual mother wavelet holds the dual wavelet equation
X
(4.27) e =2
ψ(t) e e − k).
gk ϕ(2t
The two latter conditions mean that Vj ⊥ W fj and Vej ⊥ Wj respectively (see
Figure 4.13). After some calculations similar to those in the orthogonal case
4.6. BIORTHOGONAL SYSTEMS 75
fj
W Wj
fj+1
d *
j
XXX fj
X XX
- Vj
XX
XXX
XX
Vej
fj and Vej ⊥ Wj
Figure 4.13: Vj ⊥ W
e
H(ω)H(ω) + e + π)H(ω + π)
H(ω = 1,
e
G(ω)G(ω) + e + π)G(ω + π)
G(ω = 1,
(4.28) e e + π)H(ω + π)
G(ω)H(ω) + G(ω = 0,
e
H(ω)G(ω) + e + π)G(ω + π)
H(ω = 0.
e
H(ω)H(ω) e
+ G(ω)G(ω) = 1,
(4.29) e e
H(ω)H(ω + π) + G(ω)G(ω + π) = 0.
Hence, the four equations in (4.28) can actually be reduced to two. The
latter equations are the perfect reconstruction conditions (3.8)-(3.9) for filter
banks,
√ √transformed
√ to the
√ Fourier domain (Exercise 4.21). It means that
e
2 H, 2 H, 2 G, and 2 G e are high- and lowpass filters in a biorthogonal
filter bank.
76 CHAPTER 4. MULTIRESOLUTION ANALYSIS
We will now derive a connection between the low- and highpass filters.
Cramer’s rule gives us (see Exercise 4.22)
e G(ω + π) e H(ω + π)
(4.30) H(ω) = and G(ω) = ,
∆(ω) ∆(ω)
where ∆(ω) = det M(ω). In practice we often want finite filters, which
corresponds to wavelets and scaling functions having compact support. Then
one can show that ∆(ω) has the form ∆(ω) = Ce−Liω for some odd integer
L and constant C with |C| = 1. Different choices give essentially the same
wavelet: the only thing that differs is an integer translation and the constant
C. A common choice is C = 1 and L = 1 which gives the alternating flip
construction (cf. Equation 3.10):
gk = (−1)k e
h1−k gk = (−1)k h1−k .
and e
This leaves us with the task of designing proper lowpass filters that satisfy
the first equation in (4.28). It is then easy to verify (Exercise 4.23) that the
remaining equations are satisfied.
We mentioned above that finite filters correspond to wavelets and scal-
ing functions having compact support. This is based on the Paley-Wiener
theorem. Taking this as a fact, and assuming that the lowpass filters have
lengths M and M f, it is straightforward to show that ϕ and ϕe are zero outside
intervals of length M − 1 and M − 1 (Exercise 4.24). Further, both ψ and ψe
f
are supported on intervals of length (M + M f − 2)/2.
Note that we take inner products with the dual scaling functions ϕ eJ,k . This
is a non-orthogonal projection onto VJ , along VJ .e ⊥
The scaling coefficients sj,k = hf, ϕej,k i and wavelet coefficients wj,k = hf, ψej,k i
are computed by feeding the scaling coefficients sj+1,k = hf, ϕ ej+1,k i into the
biorthogonal filter bank:
√ X √ X
(4.32) sj,k = 2 ehl−2k sj+1,k and wj,k = 2 e
gl−2k sj+1,k .
l l
Exercises 4.6
4.20. Derive one of the equations in (4.28), using the same kind of calcu-
lations as in the orthogonal case. Start with deriving the corresponding
identity for the filter coefficients.
4.21. Verify that (4.21) are the perfect reconstruction conditions for filter
banks.
4.24. Let the lowpass filters be FIR with filter lengths M and Mf. Assume
e Use the scaling equations to
e are zero outside [0, A] and [0, A].
that ϕ and ϕ
show that A = M − 1 and A e=M f − 1. Then, use the wavelet equations to
show that both ψ and ψe are zero outside [0, (M + M f − 2)/2].
kf − Pj f k ≤ C 2−jα kD α f k
Vanishing Moments
The polynomial reproducing property (4.34) is maybe not so interesting in
its own right, but rather since it is connected to the dual wavelets having
vanishing moments. If tα ∈ Vj , we then have tα ⊥ W fj , since Vj ⊥ W
fj . This
3
With a slight abuse of notation, since tα does not belong to L2 (R)
4.7. APPROXIMATION AND VANISHING MOMENTS 79
means that htα , ψej,k i = 0, for every wavelet ψej,k . Written out more explicitly,
we have
Z
tα ψej,k (t) dt = 0, for n = 0, . . . , N − 1.
We say that the dual wavelets have N vanishing moments. Having N van-
ishing moments can equivalently be stated as the Fourier transform having
a zero of order N at ω = 0,
be
D α ψ(0) = 0, for n = 0, . . . , N − 1.
be be
e ϕ(ω) be e
Using the relation ψ(2ω) = G(ω) and ϕ(0) = 1 we see that G(ω) must
have a zero of order N at ω = 0. From (4.30) it then follows that H(ω) must
be of the form
N
e−iω + 1
(4.35) H(ω) = Q(ω),
2
for some 2π-periodic function Q(ω). The larger N we choose, the sharper
transition between pass-band and stop-band we get.
e
In the same way, we can show that the filter function H(ω) must be of
the form
Ne
e e−iω + 1 e
(4.36) H(ω) = Q(ω),
2
and we get
Z
hf, ψej,k i = f (t)ψej,k (t) dt
Z
= Pα−1 (t)ψej,k (t) dt + O(2−jα)
= O(2−jα),
Exercises 4.7
4.26. Assuming that the function H(ω) has a zero of order N at ω = π, show,
using scaling, that D α ϕ(2kπ)
b = 0 for all integers k 6= 0 and 0 ≤ α ≤ N − 1.
αbe
Show further that D ψ(4kπ) = 0 for all integers k and 0 ≤ α ≤ N − 1.
4.27. Show, using Poisson’s summation formula and the previous exercise,
that for 0 ≤ α ≤ N − 1
X
k α ϕ(t − k) = tα
k
when the function H(ω) has a zero of order N at ω = π. The scaling function
ϕ thus reproduces polynomials of degree at most N − 1.
4.28. Prove, using Poisson’s summation formula and scaling, the two iden-
tities
4.8. NOTES 81
X X
(−1)k ψ(t − k) = ψ(t/2 − l + 1/4),
k l
X X
k
(−1) ϕ(t − k) = (−1)l ψ(t − l − 1/2).
k l
4.8 Notes
The idea of approximating an image at different scales, and storing the differ-
ence between these approximations, appeared already in the pyramid algo-
rithm of Burt and Adelson 1983. At the same time, the theory of wavelets had
started to make progress, and several wavelet bases had been constructed,
among others by the French mathematician Yves Meyer. This made the
French engineer Stephane Mallat realize a connection between wavelets and
filter banks. Together with Meyer he formulated the definition of multires-
olution analyses. This connection led to a breakthrough in wavelet theory,
since it gave both new constructions of wavelet bases and fast algorithms.
The Belgian mathematician Ingrid Daubechies constructed the first family
of wavelets within this new framework in 1988, and many different wavelets
have been constructed since.
The overview article by Jawerth and Sweldens [20] is a good start for
further reading and understanding of MRA and wavelets. It also contains
an extensive reference list. The books Ten Lectures on Wavelets by Ingrid
Daubechies [11] and A First Course on Wavelets by Hernandez & Weiss [16],
give a more mathematically complete description. Among other things, they
contain conditions on low- and highpass filters to generate Riesz bases of
wavelets. A detailed discussion about this can also be found in the book by
Strang & Nguyen [27].
82 CHAPTER 4. MULTIRESOLUTION ANALYSIS
Chapter 5
83
84 CHAPTER 5. WAVELETS IN HIGHER DIMENSIONS
mean value
1
(5.1) s0,k = (s1,2k + s1,2k+ex + s1,2k+ey + s1,2k+e ),
4
where ex = (1, 0), ey = (0, 1), and e = (1, 1).
y
6 6
2 2
s1,(0,3) s1,(1,3) s1,(2,3) s1,(3,3)
s0,(0,1) s0,(1,1)
s1,(0,2) s1,(1,2) s1,(2,2) s1,(3,2)
1 1
s1,(0,1) s1,(1,1) s1,(2,1) s1,(3,1)
s0,(0,0) s0,(1,0)
s1,(0,0) s1,(1,0) s1,(2,0) s1,(3,0)
- x -
1 2 1 2
H 1
w0,k = (s1,2k + s1,2k+ex − s1,2k+ey − s1,2k+e ),
4
V 1
w0,k = (s1,2k − s1,2k+ex + s1,2k+ey − s1,2k+e ),
4
D 1
w0,k = (s1,2k − s1,2k+ex − s1,2k+ey + s1,2k+e ).
4
In Figure 5.2, we have scetched the ’polarity’ of these differences, and the
averaging. The superscripts H,V ,D are shorthands for horizontal, vertical,
5.1. THE SEPARABLE WAVELET TRANSFORM 85
s1
s1,2k+ey s1,2k+e
s1,2k s1,2k+ex
+ + − − + − − +
+ + + + + − + −
The averaging in Equation (5.1) is the low-pass filtering step in the two-
dimensional Haar transform. It can be decomposed into two one-dimensional
lowpass filtering steps. First, we apply a lowpass filtering with subsamling
in the x-direction which gives us
1 1
(s1,2k + s1,2k+ex ) and (s1,2k+ey + s1,2k+e ).
2 2
Averaging these two, that is, averaging in the y-direction then gives us s0,k .
Further, the three wavelet coefficient sequences are the result of applying low-
and highpass filters in the x- and y-directions. The whole process is shown in
Figure 5.3, where H and G denote the Haar filters. The superscripts x and y
indicate that filtering and subsampling are done in the x- and y-directions.
86 CHAPTER 5. WAVELETS IN HIGHER DIMENSIONS
- Hy∗ - ↓ y2 s0
- Hx∗ - ↓x2
- G∗y - ↓ y2 w0H
s1
- Hy∗ - ↓ y2 w0V
- G∗x - ↓x2
- G∗y - ↓ y2 w0D
H = Hx Hy ,
GH = Gx H y ,
GV = H x Gy ,
GD = Gx Gy .
e G
Analysis filters H, e H, G
e V , and G
e D are defined analogously. For notational
convenience, from now on, we let the operators Hx , Hy , etc. include upsam-
pling. For instance, Hx is an upsamling in the x-direction followed by a
filtering in the x-direction. Given scaling coefficients sj+1 = (sj+1,k )k∈Z2 , we
compute averages and wavelet coefficients
sj =He ∗ sj+1 e y∗ H
=H e x∗ sj+1 ,
wjH =Ge ∗ sj+1 e∗ H
=G e∗
H y x sj+1 ,
(5.2) e e ∗G e∗
wjV = G∗V sj+1 =H y x sj+1 ,
wjD =Ge ∗ sj+1 e∗ G
=G e∗
D y x sj+1 .
We thus compute scaling and wavelet coefficients by first applying the anal-
5.1. THE SEPARABLE WAVELET TRANSFORM 87
e∗y H
G e x∗ sj+1 G
e ∗y G
e ∗x sj+1
sj+1 - e ∗ sj+1
H e ∗ sj+1
G -
x x
e y∗ H
H e x∗ sj+1 H
e y∗ G
e∗x sj+1
e x∗ sj+1 and G
ysis filter bank on the rows of sj+1 to get H e∗x sj+1 . We then apply
the analysis filter bank along the columns of these, see Figure 5.4.
To reconstruct sj+1 from the scaling- and wavelet coefficients at the
coarser scale, we reverse the whole process in Figure 5.4. Thus, we first
apply the synthesis filters in the y-direction to recover H e ∗ sj+1 and G
e ∗ sj+1 :
x x
e x∗ sj+1 = Hy sj + Gy wjH ,
H
e∗ sj+1 = Hy w V + Gy w D .
G x j j
e x sj+1 + Gx G
sj+1 = Hx H ex sj+1
= Hx Hy sj + Hx Gy wjH + Gx Hy wjV + Gx Gy wjD .
= Hsj + GH wjH + GV wjV + GD wjD .
H D H D
wJ−1 wJ−1 wJ−1 wJ−1
sJ - -
H D
wJ−2 wJ−2
sJ−1 V V
wJ−1 wJ−1
V
sJ−2 wJ−2
The wavelet coefficients are computed analogously using the highpass filters
ekH = e
g hkx e ekV = e
gky , g gkx e ekD = e
hky , and g gkx e
gky . Further, the reconstruction of
sj+1 can be seen as applying two-dimensional filters after a two-dimensional
upsampling. This upsampling operator inserts zeroes as follows.
.. .. .. .. ..
. . . . .
· · · s−1,1 0 s0,1 0 s1,1 · · ·
· · · 0 0 0 0 0 · · ·
(↑ 2)2 s = · · · s−1,0 0 s0,0 0 s1,0 · · · .
· · · 0 0 0 0 0 · · ·
· · · s−1,−1 0 s0,−1 0 s1,−1 · · ·
.. .. .. .. ..
. . . . .
H(ξ, η) = H(ξ)H(η),
GH (ξ, η) = G(ξ)H(η),
(5.3)
GV (ξ, η) = H(ξ)G(η),
GD (ξ, η) = G(ξ)G(η).
5.2. TWO-DIMENSIONAL WAVELETS 89
Ideally, H(ω) equals 1 on [0, π/2] and 0 on [π/2, π], and G(ω) equals 1 on
[π/2, π] and 0 on [0, π/2]. Then, H(ξ, η) and Gν (ξ, η) will decompose the
frequency domain as in Figure 5.6. For non-ideal filters, there is of course
overlapping between the frequency regions.
η
π6
GH GD
π
2
H GV
- ξ
π
2
π
Problems
5.1. Verify that the wavelet coefficients in the two-dimensional Haar trans-
form are given by two one-dimensional filtering steps as described in Figure
5.3.
5.2. Work out the construction of separable filters in three dimensions. How
many highpass filters will there be? (7)
5.3. Show that the two-dimensional downsampling operator can be written
as (↓ 2)2 = (↓ 2)x (↓ 2)y , where (↓ 2)x and (↓ 2)y are the one-dimemensional
downsampling operators in the x- and y-directions. Also show that
(↑ 2)2 = (↑ 2)x (↑ 2)y .
This is so, because Φ(2x − kx , 2y − ky ) is one for kx /2 < x < (kx + 1)/2,
ky /2 < y < (ky + 1)/2 and zero otherwise. The coarser approximation f0 can
similarly be written as
X
f0 (x, y) = s0,k Φ(x − kx , y − ky ).
k
This is also how wavelets and scaling functions are defined for general sep-
arable wavelet bases. We define dilated, translated, and normalized scaling
functions by
and similarly for the wavelets. Note that we need 2j as the normalizing
factor in two dimensions. We define approximation spaces Vj as the set of
all functions of the form
X
fj (x, y) = sj,k Φj,k (x, y).
k
Detail spaces WjH , WjV , and WjD are defined analogously. The scaling func-
tion satisfies the scaling equation
X
Φ(x) = 4 hk Φ(2x − kx , 2y − ky ).
k
This implies that Vj ⊂ Vj+1 and Wjν ⊂ Vj+1 . Each fj+1 ∈ Vj+1 can be
decomposed as
(5.5) fj+1 = fj + dH V D
j + dj + dj ,
and
X X
(5.7) fj (x, y) = sj,k Φj,k (x, y) and dνj (x, y) = ν
wj,k Ψνj,k (x, y).
k k
To switch between (5.6) and (5.7) we use the analysis and synthesis filter
banks (5.2) and (5.1).
Finally, we also have dual scaling functions, wavelets, approximation and
detail spaces with the same properties described above. Together with the
primal scaling functions and wavelets they satisfy biorthogonality conditions
92 CHAPTER 5. WAVELETS IN HIGHER DIMENSIONS
H(ξ, η) H(ξ + π, η) H(ξ, η + π) H(ξ + π, η + π)
GH (ξ, η) GH (ξ + π, η) GH (ξ, η + π) GH (ξ + π, η + π)
M(ξ, η) =
GV (ξ, η) GV (ξ + π, η) GV (ξ, η + π) GV (ξ + π, η + π) ,
GD (ξ, η) GD (ξ + π, η) GD (ξ, η + π) GD (ξ + π, η + π)
f η) similarly, we get
and M(ξ,
(5.8) f η)t = I.
M(ξ, η)M(ξ,
It is easy to verify that the separable filters satisfy these equations. However,
it is also possible to use these equations as the starting point for the con-
struction of two-dimensional, non-separable wavelets. We will not explore
this topic further, instead we take a look at two other, entirely different
constructions of non-separable wavelets in the next section.
Exercises 5.2
5.4. Verify Equation (5.4).
5.5. Show that Φj,k (x, y) = ϕj,kx (x)ϕj,ky (y).
5.6. Show that the scaling function satisties the scaling equation
X
Φ(x) = 4 hk Φ(2x − kx , 2y − ky ), where hk = hkx hky .
k
η
π6
H
W−1 D
W−1
π
2
H D
W−2 W−2
V
W−1
V
V−2 W−2
- ξ
π
2
π
Figure 5.7: The frequency plane decomposition for the separable wavelet
transform.
5.7. Show that Vj ⊂ Vj+1 . Hint: Use the scaling equation to show that the
scaling functions Φj,k belong to Vj+1. From there, conclude that each finite
sum
K
X
sj+1,k Φj+1,k (x, y)
k=−K
belong to Vj+1 . Finally, since Vj+1 is closed, it follows that all corresponding
infinite sums, that is, elements of Vj , belong to Vj+1.
5.8. Show that we can decompose fj+1 as in (5.5). Hint: Show that each
scaling function Φj+1,k can be decomposed in this way.
5.9. Show that separable filters satisfy the biortogonality conditions (5.8).
Quincunx Wavelets
For separable wavelets, the dilated, translated, and normalized scaling func-
tions can be written as
6 6
5x ox x ox x ox 5x x x x x x
ox x ox x ox x ox x ox x ox x
x ox x ox x ox x x x x x x
ox x ox x ox x ox x ox x ox x
x ox x ox x ox x x x x x x
ox x ox x ox x - ox x ox x ox x -
5 5
Figure 5.8: Sampling lattice (x’s) and subsampling lattice (o’s) for the Quin-
cunx and the separable case.
and that we need three different wavelets. Dilated and translated wavelets
can now be defined by
Again, the scaling function and wavelet are completely determined by the
low and highpass filter coefficients (hk ) and (gk ). In the biorthogonal case,
we also have dual scaling functions and wavelets, satisfying dual scaling and
wavelet equations with dual filters (e
hk ) and (e
gk ). Biorthogonality conditions
on wavelets and scaling functions can be transformed into conditions on the
filters. In the frequency domain they become
e 1 , ξ2 )H(ξ1 , ξ2)
H(ξ + e 1 + π, ξ2 + π)H(ξ1 + π, ξ2 + π)
H(ξ = 1,
e 1 , ξ2 )G(ξ1, ξ2 )
G(ξ + e 1 + π, ξ2 + π)G(ξ1 + π, ξ2 + π)
G(ξ = 1,
(5.9) e 1 , ξ2 )H(ξ1, ξ2 ) e 1 + π, ξ2 + π)H(ξ1 + π, ξ2 + π)
G(ξ + G(ξ = 0,
e 1 , ξ2 )G(ξ1 , ξ2 )
H(ξ + e 1 + π, ξ2 + π)G(ξ1 + π, ξ2 + π)
H(ξ = 0.
e ∗ sj+1
sj = (↓ 2)D H e∗ sj+1
and wj = (↓ 2)D G
The downsampling operator (↓ 2)D removes all coefficients but those with
indices k ∈ DZ2 . These are sometimes referred to as even indices and all
other indices are called odd. In the Inverse Wavelet Transform we successively
recover sj+1 from sj and wj , using the synthesis filters,
where the upsampling operator (↑ 2)D interleaves zeros at the odd indices.
ξ2
π 6
@
W−1 W−2@@ W−1
@
W−3 @ W @
@ −3 @
@ @
0 V−3 @ @
@ W−2 @ W−2
@ @
@ W−3@ W−3
@ @
@
W−1 @ W−2 W−1
@
−π @
- ξ1
−π 0 π
Figure 5.9: The ideal frequency plane decomposition for Quincunx wavelets.
contrast with the separable case, where the same rotation divides the fre-
quency channel WD into WH and WV , and the two latter are mixed into
WD . In Figure 5.9, we have plotted the frequency plane decomposition for
Quincunx wavelets.
Hexagonal Wavelets
Quincunx wavelet achieved more rotational invariance by including a rotation
in the dilation matrix. Another method is to use a sampling lattice different
from Z2 . One alternative is to use the hexagonal lattice Γ in Figure 5.10.
Figure 5.10: The Hexagonal Lattice (left) and the sampling frequency regions
(right) for rectangular and hexagonal sampling.
tice, assuming that ϕ is centred around the origin. The dilated, translated,
and normalized scaling functions are defined as
5.4 Notes
In the paper [9] by Cohen and Daubechies, several Quincunx wavelet bases
are constructed and their regularity is investigated. An interesting fact is that
for D being a rotation, orthogonal wavelets can at most be continuous, while
5.4. NOTES 99
101
102 CHAPTER 6. THE LIFTING SCHEME
- h∗ sj - snj
?
sj+1 p
?
- g∗ wj - − - wjn
so snj and wjn are the result of applying ’new’ filters to sj+1 ,
where
hn = h and g n = g − hp∗ .
To recover sj+1 from snj and wjn we simply proceed as in Figure 6.2. This
amounts to applying new (lifted) synthesis filters
e
hn = e g p and gen = e
h+e g,
and computing
sj+1 = e
hn snj + e
g n wjn .
The connection between the original and lifted filters in the Fourier do-
main can be written out in matrix form
e
hn (ω) 1 s(ω) e
h(ω)
(6.1a) = ,
g n (ω)
e 0 1 e
g (ω)
e
hn (ω) 1 0 h(ω)
(6.1b) = ,
g n (ω) −s(ω) 1 g(ω)
snj sj - e
h
?
?
p + - sj+1
6
?
wjn - + wj - e
g
To motivate the lifting step, let us consider a simple example, where the
initial filters are the lazy filters, that is,
n
sj,k = sj+1,2k and wj,k = sj+1,2k+1.
From a compression point of view, this is not a useful filter pair, since there
are no reasons to expect many wavelet coefficients to be small. This is where
the prediction filter p enters the scene. We try to predict the odd-indexed
scaling coefficients from the even-indexed via linear interpolation:
1 1
sbj+1,2k+1 = (sj+1,2k + sj+1,2k+2 ), or w
bj,k = psj,k = (sj,k + sj,k+1).
2 2
The new wavelet coefficients then become the prediction errors
n
wj,k bj,k
= wj,k − w
1
= sj+1,2k+1 − (sj+1,2k + sj+1,2k+2 )
2
1 1
= − sj+1,2k + sj+1,2k+1 − sj+1,2k+2.
2 2
We see that the new highpass filter is g0n = g2n = −1/2, g1n = 1, all other
gkn = 0.
In regions where the signal is smooth, the prediction can be expected to
be accurate and thus the corresponding wavelet coefficients will be small. A
more detailed analysis shows that the lifting step increases the number of
vanishing moments, from zero in the lazy wavelet transform to two vanishing
moments (linear polynomials will have zero wavelet coefficients).
Dual Lifting
After the lifting step, the wavelet coefficients changed, but the scaling coef-
ficients were left unaltered. They can be updated with a dual lifting step as
104 CHAPTER 6. THE LIFTING SCHEME
snj = sj + uwjn .
- h ∗ sj - + - snj
6
?
sj+1 p u
6
?
- g∗ wj - − - wjn
This is in fact equivalent to having one dual vanishing moment. We will not
motivate this any further, instead we go on and make the Ansatz
n n
= sj,k + Awj,k + Bwj,k−1
= −B/2 sj+1,2k−2 + Bsj+1,2k−1 + (1 − A/2 − B/2)sj+1,2k
+ Asj+1,2k+1 − A/2 sj+1,2k+2.
It is not hard to show that A = B = 1/4 gives the constant mean value
property. In this case, the new lowpass synthesis filter become hn−2 = hn2 =
−1/8, hn−1 = hn1 = 1/4, hn0 = 3/4, all other hnk = 0. The new filters (hnk ) and
(gkn ) are the analysis filters associated with the hat scaling function and the
wavelet in Figure 4.7.
6.2. FACTORIZATIONS 105
6.2 Factorizations
As we have seen in Chapter 3, to any biorthogonal wavelet basis with finite
support there corresponds ’polynomials’, uniquely defined up to shifts and
multiplication with scalars. Such a ’polynomial’ may be written
L−1
X
h(z) = hk z −k
k=0
PL−1 PL−1−N
(A shift produces k=0 hk z −k−N = k=−N hk+N z −k .)
As we also noted in Chapter 3, there are three other polynomials h̃, g, g̃,
such that
h(z)h̃(z −1 ) + g(z)g̃(z −1 ) = 2
h(z)h̃(−z −1 ) + g(z)g̃(−z −1 ) = 0
These are the conditions for perfect reconstruction from the analysis. We may
write them in modulation matrix notation as follows (with redundancy):
h(z −1 ) g(z −1 ) h̃(z) h̃(−z) 1 0
=2 .
h(−z −1 ) g(−z −1 ) g̃(z) g̃(−z) 0 1
h(z) = he (z 2 ) + z −1 ho (z 2 )
where
P̃ (z −1 )t P (z) = I
where I is the identity matrix. We now shift and scale g and g̃ so that
det P (z) ≡ 1.2
Note that the basis is orthogonal precisely when P = P̃ , i.e., h = h̃ and
g = g̃.
Looking at P̃ (z −1 )t P (z) = I, we note that P (i.e., h and g) determines
P̃ (i.e., h̃ and g̃) since the matrix P̃ (z −1 )t is the inverse of the matrix P (z)
and det P (z) = 1.
We will even see that, given h such that he and ho have no common
zeroes (except 0 and ∞), such a P (and thus P̃ ) can be constructed, using
the Euclidean division algorithm on the given he and ho .
The Euclidean division algorithm for integers is now demonstrated in a a
specific case, by which the general principle becomes obvious.
85 = 2 · 34 + 17
34 = 2 · 17 + 0
using 2 · 34 ≤ 85 < 3 · 34. Now, clearly, 17 divides both 34 and 85, and is the
greatest common divisor. We now proceed with 83 and 34 instead.
83 = 2 · 34 + 15
34 = 2 · 15 + 4
15 = 3 · 4 + 3
4=1·3+1
3=3·1+0
This means that 1 is the greatest common divisor of 83 and 34, and they are
thus relatively prime. In matrix notation this is
83 2 1 2 1 3 1 1 1 3 1 1
=
34 1 0 1 0 1 0 1 0 1 0 0
2
This is possible, since det P (z) must have length 1 if the inverse P̃ (z −1 )t is to contain
polynomials only. Scaling and shifts are thus made in unison.
6.2. FACTORIZATIONS 107
Here the algorithm stops when the remainder is 0. We may represent the
steps of the algorithm as
−1
1 + 4z −1 + z −2 1 + z −1 1 1/2z + 1/2 1 2z
−1 =
1+z 1 0 1 0 0
Note that the number of steps will be bounded by the length of the division,
here 2.
When the common factor has length 1, the two polynomials he and ho
have no zeroes in common, except possibly 0 and ∞.
Example 6.3. The Haar case may be represented by h(z) = 1 + z −1 with
he (z) = ho (z) ≡ 1. The algorithm may then be represented by
1 1 1 1
=
1 1 0 0
which is just one step.
We will eventually arrive at the following factorization below: (6.3).
k/2
Y
he (z) 1 si (z) 1 0 C
=
ho (z) 0 1 ti (z) 1 0
i=1
108 CHAPTER 6. THE LIFTING SCHEME
if we shift both he and ho with the factor z M from the algorithm. (This
means that ge and go has to be shifted with z −M to preserve det P (z) = 1.)
6.3. LIFTING 109
6.3 Lifting
The factorization (6.3) gives a polyphase matrix P n (z) through
k/2
Y
n he (z) gen (z) 1 si (z) 1 0 C 0
P (z) := :=
ho (z) gon (z) 0 1 ti (z) 1 0 1/C
i=1
where the last scaling 1/C is chosen to give det P n (z) = 1. Here the super-
script n indicate that gen and gon may not come from the same highpass filter
g, which we started from in P . All we did was to use the Euclidean algorithm
on he and h0 without any regard to g.
Moreover, given a polyphase matrix P (z), any P n (z) with the same h(z),
i.e., identical first columns and det P n (z) = det P (z) = 1, is thus related to
P (z) through
n 1 s(z)
P (z) = P (z)
0 1
for some polynomial s(z). In the same way, any P n (z) with the same g(z)
as P (z) and with det P n (z) = 1 satisfies
n 1 0
P (z) = P (z)
t(z) 1
for some polynomial t(z). In these two cases P n is said to be obtained from
P by lifting and dual lifting, respectively.
In this terminology, we can now conclude that any polyphase matrix can
be obtained from the trivial subsampling of even and odd indexed elements
(with the trivial polyphase matrix I, i.e., h(z) = 1 and g(z) = z −1 ) by
successive lifting and dual lifting steps and a scaling.
6.4 Implementations
We will now turn to how the factorization is implemented. The polyphase
matrix P (z)t performs the analysis part of the transform. E.g., with x(z) =
xe (z 2 ) + z −1 xo (z 2 ) as before,
t xe (z) he (z)xe (z) + ho (z)xo (z)
P (z) =
xo (z) ge (z)xe (z) + go (z)xo (z)
represents the even numbered entries of the outputs h(z)x(z) and g(z)x(z)
after subsampling.
Specifically, we will now discuss the Haar case with h(z) = 1 + z −1 .
110 CHAPTER 6. THE LIFTING SCHEME
Example 6.4. Using the algorithm on the Haar lowpass filter, h(z) = 1 +
z −1 , we have he (z) = ho (z) = 1, and obtain
1 1 0 1
= .
1 1 1 0
This gives
n 1 0 1 0 1 0
P (z) = =
1 1 1 1 0 1
i.e., h(z) = 1 + z −1 and g(z) = −1/2 + 1/2z −1 , then we have (identical first
columns)
n 1 s(z)
P (z) = P (z)
0 1
Correspondingly, we get
−1 1 −1 1 0
P̃ (z ) =
0 1 1/2 1
Denote the signal to be analyzed by {xk }k ,3 and let the successive low-
(j) (j)
and highpass components of it be {vk }k and {uk }k after, respectively, stage
j = 1, 2, . . . . In our example, this becomes for the analysis,
(
(1)
vk = x2k
(1)
uk = x2k+1
(
(2) (1) (1)
vk = vk + uk
(2) (1)
uk = uk
(
(2)
vk = vk
(2) (2)
uk = −1/2vk + uk
Exercises 6.4
6.1. Determine the polyphase matrix P (z) for the lazy filter bank: h(z) =
h̃(z) = 1 and g(z) = g̃(z) = z −1 . (identity)
(1 + z −1 )(1 + 2z −1 ) = 1 + 3z −1 + 2z −2 and 1 + z −1 .
3
This was denoted by the letter s instead of x in Section 6.1.
112 CHAPTER 6. THE LIFTING SCHEME
6.3. Determine two distinct polyphase and dual polyphase matrices sharing
the polynomial h(z) in Example 6.2.
6.4.
√ Consider
√ what happens in Example 6.4 if h(z) = 1 + z −1 is scaled to
1/ 2 + 1/ 2 z −1 .
6.5 Notes
This chapter is based on the paper Daubechies & Sweldens [12]. For informa-
tion about the construction of integer-to-integer transforms, see the paper by
Calderbank & Daubechies & Sweldens & Yeo [6]. A practical overview can
be found in an article by Sweldens & Schröder [29]. All these three papers
contain many further references.
Chapter 7
The continuous wavelet transform is the prototype for the wavelet techniques,
and it has its place among ’reproducing kernel’ type theories.
In comparison to the discrete transform, the continuous one allows more
freedom in the choice of the analyzing wavelet. In a way, the discrete wavelet
transform in Chapter 4 is an answer to the question of when the dyadic
sampling of the continuous transform does not entail loss of information.
This chapter may be a little less elementary, but the arguments are quite
straightforward and uncomplicated.
113
114 CHAPTER 7. THE CONTINUOUS WAVELET TRANSFORM
2
Example 7.1. Consider the wavelet ψ(t) = te−t . Take f (t) = H(t − t0 ) −
H(t − t1 ) where t0 < t1 , and H(t) is the Heaviside unit jump function at
t = 0.
7.2. GLOBAL REGULARITY 115
Further
2
Z t1 Z t1
t−b t−b − t−b
Exercises 7.1
7.1. Work out what happens if the multiplication is done by the complex
ca (a different function) instead of by the complex conjugate of
conjugate of Ψ
ca in Equation (7.1).
ψ
7.2. Make the modifications to the definition of the continuous wavelet trans-
form and to the reconstruction formula needed if ψ is allowed to be complex-
valued.
7.5. Consider now the function ψ(t) = t/(1 + t2 ), and check that it is an
admissible wavelet. Let now f (t) = H(t − t0 ) − H(t − t1 ) as in Example 7.1,
compute W f (a, b), and compare with the result in the example.
(1 + | · |2 )s/2 fb ∈ L2
with norm
Z ∞ 1/2
kf kH s = 2 s/2
|(1 + |ω| ) fb(ω)|2 dω
−∞
ca (β) =
a−s Fb W f (a, β) = f (β)a−s ψ
ca (β).
= |β|sf (β) |aβ|−sψ
Squaring the absolute values, integrating, and using the Parseval formula,
we obtain
Z ∞Z ∞
|a−s Wf (a, b)|2 da/a2 db =
−∞ Z0 Z ∞
∞
= −1 s 2
|F (| · | F f )(t)| dt b
|ω|−2s |ψ(ω)| 2
dω/ω
−∞ 0
if
Z ∞
0< b
|ω|−2s |ψ(ω)| 2
dω/ω < ∞.
0
1
The definition of H s needs to be made more precise, but we will only note that
infinitely differentiable functions f which vanish outside a bounded set is dense in H s
with its standard definition.
7.3. LOCAL REGULARITY 117
Z ∞ Z ∞
A kf k2H s ≤ |a−s W f (a, b)|2 da/a2 db ≤ B kf k2H s
−∞ 0
Exercises 7.2
7.6. Show that positive constants C1 and C2 exist for which
C1 kf kH s ≤ kf k + kF −1{| · |s F f }k ≤ C2 kf kH s
where 0 < s < 1, say. This is usually called a local Lipschitz condition and
s is a measure of the regularity. The adjective local refers to the condition
being tied to the point t0 , and thus it is not global or uniform.
What does this local regularity condition on the function f imply for the
wavelet transform Wψ f (a, b)?
118 CHAPTER 7. THE CONTINUOUS WAVELET TRANSFORM
R
We note first that the cancellation property ψ(t) dt = 0 makes it possi-
ble to insert f (t0 ):
Z ∞
−1/2 t−b
W f (a, b) = (f (t) − f (t0 )) a ψ dt.
−∞ a
Thus
Z
∞
s −1/2
t − t0
|W f (a, t0 )| ≤ C |t − t0 | a ψ dt =
−∞ a
1
+s
= Ca 2
and
Z
∞
−1/2 t − b
|W f (a, b)| ≤ |f (t) − f (t0 )|a ψ dt +
−∞ a
Z
∞
−1/2 t − b
+ |f (t0 ) − f (b)|a ψ dt
−∞ a
≤ Ca1/2 (as + |b − t0 |s ).
Thus the local Lipschitz regularity condition implies a growth condition on
the wavelet transform.
Conversely, going from a growth condition on the wavelet transform, there
is the following local regularity result. Note that a global condition on f is
also made, and that an extra logarithmic factor appears in the regularity
estimate.
If f is, say, bounded and
it follows that
holds for all t close enough to t0 . We omit the argument leading to this
result.
Exercises 7.3
7.7. Determine the exponent in the Lipschitz condition for the function
f (t) = ts for t > 0 and f (t) = 0 otherwise. Calculate also the transform,
using the Haar wavelet ψ(t) = t/ |t| for 0 < |t| < 1/2 and ψ(t) = 0 elsewhere.
7.4. NOTES 119
7.4 Notes
Further material may be found in, e.g., the books by Holschneider [18], Ka-
hane & Lemarié [21], and Meyer [23].
The connection between differentiability properties and discrete wavelet
representations is described in Chapter 12.
120 CHAPTER 7. THE CONTINUOUS WAVELET TRANSFORM
Part II
Applications
121
Chapter 8
So far we have only seen a few examples of wavelet bases. There is in fact a
large number of different wavelet bases, and in a practical application it is not
always an easy task to choose the right one. In this chapter, we will describe
the most frequently used wavelet bases. We will also try to give some general
advice on how to choose the wavelet basis in certain applications.
gk = (−1)k e
h1−k gk = (−1)k h1−k .
and e
123
124 CHAPTER 8. WAVELET BASES: EXAMPLES
coefficients where the signal is smooth. The vanishing moments thus produce
the compression ability, and are also connected to the smoothness of the syn-
thesis wavelet. We usually want the synthesis wavelets to have some smooth-
ness, which is particularly important in compression applications. This is so
because compression is achieved by, roughly speaking, leaving out terms in
the sum
X
f (t) = wj,k ψj,k (t),
j,k
Consider the term (1−y)N O(y N )+y N O((1−y)N ). This must be a polynomial
of degree at most 2N − 1. On the other hand, it has zeros of order N at both
y = 0 and y = 1. Therefore, it must be identically 0, and so PN is a solution
to (8.2). It is possible to show that every other solution must be of the form
Symmlets
A way to make orthogonal wavelets less asymmetric is to do the spectral fac-
torization without requiring the minimum-phase property. Instead of always
choosing zeros inside the unit circle, we may choose zeros to make the phase
as nearly linear as possible. The corresponding family of orthogonal wavelets
is usually referred to as least asymmetric wavelets or Symmlets. They are
clearly more symmetric than Daubechies wavelets, as can be seen in Figure
8.3. The price we have to pay is that the Symmlet with N vanishing moment
is less regular than the corresponding Daubechies wavelet.
8.2. ORTHOGONAL BASES 127
1.5 2
1 1
0.5 0
0 −1
−0.5 −2
−1 0 1 2 3 4 −2 −1 0 1 2 3
1.5 2
1 1
0.5 0
0 −1
−0.5 −2
0 2 4 6 −2 0 2 4
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
−2 0 2 4 6 −4 −2 0 2 4
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
−2 0 2 4 6 8 −4 −2 0 2 4 6
ϕ ψ
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
0 5 10 −5 0 5
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
0 5 10 −5 0 5
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
0 5 10 15 −5 0 5 10
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
0 5 10 15 −5 0 5 10
ϕ ψ
1.5 2
1 1
0.5 0
0 −1
−0.5 −2
−2 0 2 4 6 8 −5 0 5
1.5 2
1 1
0.5 0
0 −1
−0.5 −2
0 5 10 −5 0 5
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
0 5 10 15 −5 0 5
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
0 5 10 15 20 −10 −5 0 5 10
ϕ ψ
Coiflets
Another family of orthogonal wavelets was designed to give also the scaling
functions vanishing moments,
Z ∞
tn ϕ(t) dt = 0, n = 1, . . . , N − 1.
−∞
This can be a useful property since, using a Taylor expansion, hf, ϕJ,k i =
2J/2 f (2−J k) + O(2−Jp) in regions where f has p continuous derivatives. For
smooth signals, the fine-scale scaling coefficients can thus be well approxi-
mated by the sample values, and pre-filtering may be dispensed with. These
wavelets are referred to as Coiflets. They correspond to a special choice of
the polynomial R in (8.3). Their support width is 3N −1 and the filter length
is 3N. In Figure 8.4, we have plotted the first four Coiflets together with
their scaling functions. We see that the Coiflets are even more symmetric
than the Symmlets. This is, of course, obtained to the price of the increased
filter length.
1.5 2
1 1
0.5 0
0 −1
−0.5 −2
−2 0 2 4 6 −4 −2 0 2 4
1.5 2
1 1
0.5 0
0 −1
−0.5 −2
0 5 10 −5 0 5
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
0 5 10 15 −10 −5 0 5 10
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
0 5 10 15 20 −10 −5 0 5 10
ϕ ψ
Figure 8.4: The first four Coiflets with scaling functions for 2, 4, 6, and 8
vanishing moments.
132 CHAPTER 8. WAVELET BASES: EXAMPLES
Exercises 8.2
8.1. Show that |Q(ω)|2 is a polynomial in cos ω. Hint: It is a trigonometric
polynomial,
K
X K
X K
X
|Q(ω)|2 = ck eikω = ck cos kω + i ck sin kω.
k=−K k=−K k=−K
is more easily detected by the human eye, if the synthesis wavelets ψj,k are
asymmetric. Also, symmetry is equivalent to filters having linear phase, cf.
Section 2.4.
In general, we have more flexibility in the biorthogonal case, since we
have two filters to design instead of one. Therefore, a general guideline could
be to always use biorthogonal bases, unless orthogonality is important for
the application at hand.
The filter design is also easier in the biorthogonal case. There are several
different ways to design the filters. One method is to choose an arbitrary
synthesis lowpass filter H(z). For the analysis lowpass filter, we have to
e
solve for H(z) in
(8.4) e −1 ) + H(−z)H(−z
H(z)H(z e −1
) = 1.
It is possible to show that solutions exist, if, e.g., H(z) is symmetric and
H(z) and H(−z) have no common zeros. These solutions can be found by
e
solving linear systems of equations for the coefficients of H(z).
Another method is to apply spectral factorization to different product
filters. If we want to design symmetric filters, both Q and Q e must be sym-
e
metric. A detailed analysis shows that N + N must be even, and that we
can write
e
Q(ω) = q(cos ω) and Q(ω) = qe(cos ω)
for some polynomials q and qe. This is when Q and Q e both have odd length.
They can also both have even length, and then a factor e−iω/2 must be in-
cluded. In any case, if we define L = (N + N e )/2 and the polynomial P
through
2 ω
P sin = q(cos ω)e
q (cos ω),
2
the equation (8.4) transforms into
(1 − y)L P (y) + y LP (1 − y) = 2,
134 CHAPTER 8. WAVELET BASES: EXAMPLES
The solutions to this equation are known from the previous section. After
e are computed using
choosing a solution, that is, choosing R in (8.3), Q and Q
spectral factorization.
A third method to construct biorthogonal filters, is to use lifting, as
described in Chapter 6.
ϕ
1.5
0.5
−0.5
−1 0 1 2
2 1.5 2
1 1 1
0 0.5 0
−1 0 −1
−2 −0.5 −2
−2 0 2 −2 0 2 −2 0 2
2 1.5 2
1 1 1
0 0.5 0
−1 0 −1
−2 −0.5 −2
−2 0 2 −2 0 2 −2 0 2
2 1.5 2
1 1 1
0 0.5 0
−1 0 −1
−2 −0.5 −2
−2 0 2 −2 0 2 −2 0 2
ψ e
ϕ ψe
e = 1, 2, and 3.
Figure 8.5: Biorthogonal spline wavelets with N = 1 and N
136 CHAPTER 8. WAVELET BASES: EXAMPLES
ϕ
1.5
0.5
−0.5
−2 0 2
2 6
4
4
1
2 2
0
0 0
−2
−1 −2 −4
−2 0 2 −2 0 2 −2 0 2
2 3 3
2 2
1
1
1
0
0
0 −1
−1 −1 −2
−2 0 2 4 −4 −2 0 2 4 −2 0 2 4
2 2 3
1 2
1
1
0
0
0
−1 −1
−2 −1 −2
−2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
2 2 3
1 2
1
1
0
0
0
−1 −1
−2 −1 −2
−2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
ψ e
ϕ ψe
e = 2, 4, 6, and 8.
Figure 8.6: Biorthogonal spline wavelets with N = 2 and N
8.3. BIORTHOGONAL BASES 137
ϕ
1
0.5
−0.5
−2 0 2
4 4
1
2 2
0 0
0
−2
−1 −2
−4
−2 0 2 4 −2 0 2 4 −2 0 2
2 2
1
1
1
0 0
0 −1
−1
−2
−1
−2 0 2 4 −4 −2 0 2 4 −2 0 2 4
2
2
1
1 1
0 0
0 −1
−1
−2
−1
−2 0 2 4 −4 −2 0 2 4 −2 0 2 4
2
2
1
1 1
0 0
0 −1
−1
−2
−1
−2 0 2 4 −4 −2 0 2 4 −2 0 2 4
ψ e
ϕ ψe
e = 3, 5, 7, and 9.
Figure 8.7: Biorthogonal spline wavelets with N = 3 and N
138 CHAPTER 8. WAVELET BASES: EXAMPLES
To make the filters have more similar lengths, we again invoke the spectral
factorization method to factorize PL (z) into Q(z) and Q(z). e To maintain
−1
symmetry, we always put zeros zi and zi together. Also, zi and zi must
always come together in order to have real-valued filter coefficients. This
limits the number of possible factorizations, but, for fixed N and N e , there
are still several possibilities. A disadvantage with these wavelets is that the
filter coefficients no longer are dyadic rational, or even rational.
We have plotted scaling functions and wavelets for some for some of the
most common filters in Figures 8.8 - 8.9.
The first filter pair is the biorthogonal filters used in the FBI finger-
print standard. They are constructed using a particular factorization of the
Daubechies-8 product filter. Both the primal and the dual wavelet have 4
vanishing moments. The filters have lengths 7 and 9, and this is the reason
for the notation 9/7 (the first number being the length of the dual lowpass
filter).
The second 6/10 pair is obtained by moving one vanishing moment (a
zero at z = −1) from the primal to the dual wavelet, and also interchanging
some other zeros. The primal scaling function and wavelet become somewhat
smoother. This filter pair is also very good for image compression.
The two last 9/11 pairs are based on two different factorizations of the
Daubechies-10 product filter.
1.5
1
1
0.5 0.5
0
0
−0.5
−0.5 −1
−4 −2 0 2 4 −4 −2 0 2 4
1.5
1
1
0.5 0.5
0
0
−0.5
−0.5 −1
−5 0 5 −4 −2 0 2 4
Scaling function Wavelet
1 1.5
0.8 1
0.6 0.5
0.4 0
0.2 −0.5
0 −1
−0.2 −1.5
−2 0 2 4 −4 −2 0 2 4
1 1
0.5 0
−1
0
−2
−2 0 2 4 −4 −2 0 2 4
Figure 8.8: The 6/10 and the FBI 9/7 biorthogonal filter pair.
140 CHAPTER 8. WAVELET BASES: EXAMPLES
0 −0.5
−1
−0.5 −1.5
−5 0 5 −4 −2 0 2 4
−0.2
−1
−5 0 5 −4 −2 0 2 4
0.4 0.5
0.2 0
0
−0.5
−0.2
−5 0 5 −4 −2 0 2 4
6 6
4
4
2
2
0
0
−2
−2 −4
−4 −6
−5 0 5 −4 −2 0 2 4
Figure 8.9: Scaling functions and wavelets for two 9/11 biorthogonal filter
pair.
8.4. WAVELETS WITHOUT COMPACT SUPPORT 141
anayway, both for historical reasons, and because of their interesting theo-
retical properties.
Meyer Wavelets
The Meyer wavelets are explicitly described: the Fourier transform of the
wavelet is defined as
1 −iω/2
2π e sin π2 ν(3 |ω| − 1) if 2π
3
< |ω| ≤ 4π
3
,
b
ψ(ω) = 2π e 1 −iω/2 π 3 4π
cos 2 ν( 2 |ω| − 1) if 3 < |ω| ≤ 3 8π
0, otherwise.
Battle-Lemarié Wavelets
The construction of the Battle-Lemarié orthogonal family starts from the
B-spline scaling functions. These are not orthogonal to their integer trans-
lates, since they are positive and overlapping. Remember the orthogonality
condition for the scaling function from Section 4.4.
X
(8.5) b + 2πl)|2 = 1.
|ϕ(ω
l
142 CHAPTER 8. WAVELET BASES: EXAMPLES
1.5 2
1.5
1
1
0.5
0.5
0
−0.5
0
−1
−0.5 −1.5
−6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6
for all ω, where A and B are some positive constants. This is actually the
condition on {ϕ(t − k)} to be a Riesz basis, transformed to the Fourier
domain. Now it is possible to define the Battle-Lemarié scaling function
b
ϕ(ω)
b# (ω) = qP
ϕ .
2
l b + 2πl)|
|ϕ(ω
This scaling function satisfies (8.5), and thus it generates an orthogonal mul-
tiresolution analysis. The wavelet is defined by the alternating flip construc-
tion.
One can show that this ’orthogonalized’ spline scaling function spans the
same Vj spaces as the original spline scaling function. However, it will no
longer have compact support, and this also holds for the wavelet. They both
have exponential decay, though, and are symmetrical. In Figure 8.10, we
have plotted the Battle-Lemarié scaling function and wavelet correponding
to piecewise linear splines. Note that they are both piecewise linear, and
that the decay is indeed very fast.
The semiorthogonal spline wavelets of Chui & Wang use B-splines of order
N as scaling functions. The wavelets become spline functions with support
width [0, 2N − 1]. The dual scaling functions and wavelets are also splines
but without compact support. They decay very fast, though. All scaling
functions and wavelets are symmetric, and all filter coefficients are rational.
We also have analytic expressions for all scaling functions and wavelets.
8.5 Notes
One of the first families of wavelets to appear was the Meyer wavelets, which
were constructed by Yves Meyer in 1985. This was before the connection
between wavelets and filter banks was discovered, and thus the notion of
multiresolution analysis was not yet available. Instead, Meyer defined his
wavelets through explicit constructions in the Fourier domain.
Ingrid Daubechies constructed her family of orthogonal wavelets in 1988.
It was the first construction within the framework of MRA and filter banks.
Shortly thereafter followed the construction of Symmlets, Coiflets (which
were asked for by Ronald Coifman in 1989, therefore the name), and later
also various biorthogonal bases.
For more details about the construction of different wavelet bases, we
refer to Daubechies’ book [11], where also tables with filter coefficients can
be found. Filter coefficients are also available in WaveLab.
144 CHAPTER 8. WAVELET BASES: EXAMPLES
Chapter 9
Adaptive Bases
145
146 CHAPTER 9. ADAPTIVE BASES
both are close to the total energy of the signal. (Cf. the inequality 1.1
in Chapter 1, which puts a limit to simultaneous localization in time and
in frequency.) By scaling, the basis functions ψj,k (t) then have their en-
ergy essentially concentrated to the time interval (2−j k, 2−j (k + 1)) and
the frequency interval (2j π, 2j+1π). We associate ψj,k with the rectangle
(2−j k, 2−j (k + 1)) × (2j π, 2j+1 π) in the time-frequency plane (see Figure 9.1).
These rectangles are sometimes referred to as Heisenberg boxes. The Heisen-
ω
6
2j+1π
2j π
- t
−j −j
2 k 2 (k + 1)
berg boxes for the wavelets ψj,k gives a tiling of the time-frequency plane
as shown in Figure (9.2). The lowest rectangle corresponds to the scaling
function ϕj0 ,0 at the coarsest scale. If we have an orthonormal wavelet basis,
ω
π6
ψ−2,0 ψ−2,1
ψ−3,0
ϕ−3,0
- t
8
ω ω
6 6
- t - t
Figure 9.3: Time-frequency plane for the basis associated with sampling and
the Fourier basis.
ω ω
6 6
2π
T
2π
T
- t - t
T T
Figure 9.4: Tiling of the time-frequency plane for windowed Fourier trans-
form with two window sizes
How to choose a proper window size is the major problem with the windowed
Fourier transform. If the window size is T , each segment will be analyzed
at frequencies, which are integer multiples of 2π/T . Choosing a narrow win-
dow will give us good time resolution, but bad frequency resolution, and vice
versa.
One solution to this problem would be to adapt the window size to the
frequency, and use narrow windows at high frequencies and wider windows for
low frequencies. This is basically done by the wavelet transform, even though
there is no explicit window involved. For high frequencies, the wavelets are
well localized in time, and the Heisenberg boxes are narrow and high. At
lower frequencies, the wavelets are more spread out, and we get boxes with
large width and small height. This is useful, since many high frequency
phenomena are short-lived, e.g., edges and transients, while lower frequency
components usually have a longer duration in time.
Exercises 9.1
9.1. Sketch for yourself how the decomposition of the time-frequency plane
changes as we go through the filtering steps in the Forward Wavelet Trans-
form, starting at the finest scaling coefficients.
9.2. WAVELET PACKETS 149
sJ
XXX
XXX
XXX
9
z
X
sJ−1 wJ−1
HH HH
HH HH
j
H
j
H
sJ−2 wJ−2
@ @ @ @
@R
@ @@
R @@
R @
R
@
sJ−3 wJ−3
V0
(0) (1)
W−1 W−1
(0,0) (0,1) (1,0) (1,1)
W−2 W−2 W−2 W−2
(0,0,0) (0,0,1) (0,1,0) (0,1,1) (1,0,0) (1,0,1) (1,1,0) (1,1,1)
W−3 W−3 W−3 W−3 W−3 W−3 W−3 W−3
-
0 π/2 π
Figure 9.6: The ideal frequency bands for the wavelet packet spaces
In Figure (9.7) we show the tiling of the ideal time-frequency plane for two
particular wavelet packet bases together with the corresponding subtrees of
the wavelet packet tree. Note that we get two high-frequency basis functions
with long duration in time with the first basis (the left figure).
t t
HH HH
t HHt t HHt
@ @ @
t @t t @t t @t
A A A
t At t At t At
ω ω
6 6
- t - t
Figure 9.7: The tiling of the ideal time-frequency plane for two different
wavelet packet bases
η η
π6 π6
(2,2) (2,3) (3,2) (3,3)
W−2 W−2 W−2 W−2
(2) (3)
W−1 W−1
(2,0) (2,1) (3,0) (3,1)
W−2 W−2 W−2 W−2
π π
2 2
(0,2) (0,3) (1,2) (1,3)
W−2 W−2 W−2 W−2
(0) (1)
W−1 W−1
(0,0) (0,1) (1,0) (1,1)
W−2 W−2 W−2 W−2
- ξ - ξ
π π
2
π 2
π
Figure 9.8: The ideal decomposition of the frequency plane for separable
two-dimensional wavelet packet bases
9.3. ENTROPY AND BEST BASIS SELECTION 153
Exercises 9.2
9.2. Sketch for yourself step by step how the ideal decomposition of the
time-frequency plane changes when going from the representaion in the VJ
space to the wavelet packets representations in Figure 9.7.
where
|ck |2
pk = and 0 log 0 := 0.
kck2
A well-known fact about the entropy measure, which is outlined in Exer-
cise 9.4, is the following:
Theorem 9.2. If c is a finite sequence, of length K say, then
0 ≤ H(c) ≤ log K.
Further, the minimum value is attained only when all√ but one ck are 0, and
the maximum is attained only when all |ck | equal 1/ K.
Of course, the conclusion also holds for sequences c with at most K non-zero
coefficients. The number d(c) = eH(c) is known in information theory as the
theoretical dimension of c. It can be proved that the number of coefficents
that are needed to approximate c with error less than ǫ is proportional to
d(c)/ǫ.
However, we may not use the entropy measure as a cost function directly,
since it is not additive. But if we define the additive cost function
X
Λ(c) := − |ck |2 log |ck |2 ,
k
we have that
1
(9.2) H(c) = 2 log kck + Λ(c).
kck2
Thus we see that minimizing Λ is equivalent to minimizing H for coefficient
sequences with fixed norm. Therefore, if we want to use the best basis
algorithm to minimize the entropy, we must ensure that the norm of sJ
equals the norm of [sJ−1 wJ−1 ]. In other words, we must use orthogonal
filters.
9.3. ENTROPY AND BEST BASIS SELECTION 155
For coefficient sequences with fixed ℓ2 (Z)-norm, Λ is also minimized when all
coefficients are 0 except one and maximized when all |ck | are equal. To get
relevant comparisons between sJ and [sJ−1 wJ−1 ], we again need the filters
to be orthogonal.
Finally, we mention two application-dependent cost functions. When
using wavelet packets for reducing noise in signals, one would like to choose
the basis that gives the smallest error between the denoised signal and the
real signal. As a cost function one can use an estimate of this prediction
error. One example of this, the SURE cost function, will be discussed in
Section 10.2. In classification applications, cost functions that measure the
capability of separating classes are used. They will be discussed in more
detail in Chapter 14.
Exercises 9.3
9.3. Verify that the entropy measure is not an additive cost function, but
that Λ is. Then verify the identity (9.2).
9.4. Prove theorem 9.2. Hint: Use that the exponential function is strictly
convex, which means that
P X
λk xk
e k ≤ λk exk ,
k
P
for λk ≥ 0 and k λk = 1, and equality occurs only when all xk are equal.
Apply this with λk = pk and xk = log pk to obtain eH(c) ≤ K. The left
inequality should be obvious, since all 0 ≤ pk ≤ 1, which implies log pk ≤ 0.
The only way H(c) = 0 could happen is if all pk are either 0 or 1. But since
they sum up to 1, this implies that one pk equals 1 and the remaining are 0.
where the maximum value is attained when all |ck | are equal, and the min-
imum is attained when all ck ’s but one are 0. Hint: For the right hand
inequality, write
K
X K
X
|ck | = 1 · |ck |
k=1 k=1
and use the Cauchy-Schwartz inequality. For the left hand inequality, you
may assume that kck = 1 (why?). Show that this implies |ck | ≤ 1 for each
k. Then |ck |2 ≤ |ck |, and the inequality follows.
These somewhat complicated conditions are needed to ensure that the local
trigonometric basis functions, which we will define soon, give an ON basis of
L2 (R).
9.4. LOCAL TRIGONOMETRIC BASES 157
1.5
0.5
−0.5
−1
−1 0 1 2 3 4
2 2
1 1
0 0
−1 −1
−2 −2
−1 0 1 2 −1 0 1 2
The Local Trigonometric Basis Functions for the partitioning Ik are con-
structed by filling the windows wk (t) with cosine oscillations at frequencies
π(n + 21 ):
√ 1
bn,k (t) = 2 wk (t) cos π(n + )(t − k) .
2
It can be shown that the functions bn,k constitute an orthonormal basis for
158 CHAPTER 9. ADAPTIVE BASES
When computing the coefficients cn,k , after the folding, we can use the fast
2.5
1.5
0.5
−0.5
−1
−1.5
−2
−1 −0.5 0 0.5 1 1.5 2
Figure 9.11: The windowed function f (t)w(t) (solid line) and its folded ver-
sion (dashed line).
Adaptive Segmentation
As mentioned earlier, the construction of the basis functions bn,k works for
an arbitrary partition of the real line. We seek an optimal partition, using,
for instance, one of the cost functions described above. This can be done
by adaptively merging intervals, starting from some initial partition. Let us
consider merging the intervals I0 = [0, 1] and I1 = [1, 2] to Ie0 = [0, 2]. A
window function for Ie0 is given by
p
e0 (t) =
w w0 (t)2 + w1 (t)2 ,
which can be shown to satisfy the conditions (9.3a) -(9.3e), slightly modified.
The basis functions associated with Ie0 are defined by
1
ebn,0 (t) = √ w 1 t
e0 (t) cos π(n + ) .
2 2 2
Exercises 9.4
9.6. Show that the folded function fe is given by
(
wk (k − t)f (k + t) − wk (k + t)f (k − t), for − 12 ≤ t ≤ 0,
fe(k + t) =
wk (k + t)f (k + t) + wk (k − t)f (k − t), for 0 ≤ t ≤ 21 .
160 CHAPTER 9. ADAPTIVE BASES
ω
6
- t
Also show that f can be recovered from its folded version through
(
wk (k + t)fe(k − t) + wk (k − t)f (k + t), for − 21 ≤ t ≤ 0,
f (k + t) =
wk (k + t)f (k + t) − wk (k − t)f (k − t), for 0 ≤ t ≤ 12 .
9.5 Notes
For further reading about wavelet packets and local trigonometric bases, and
time-frequency decompositions in general, we refer to the paper Lectures on
Wavelet Packet Algorithms [31] by Wickerhauser.
Wavelet packets can also be defined in the non-separable case. Wavelet
packets for hexagonal wavelets give rise to some fascinating frequency plane
decompositions, which can be found in the paper by Cohen and Schlenker
[10].
Chapter 10
- Image - - Entropy -
f Quantization
transform coding
161
162 CHAPTER 10. COMPRESSION AND NOISE REDUCTION
Image Transform
Most images have spatial correlation, that is, neighbouring pixels tend to
have similar grey-scale values. The purpose of the image transform is to
exploit this redundancy to make compression possible. Remember the two-
dimensional Haar transform, where groups of four pixel values are replaced by
their mean value, and three wavelet coefficients or ’differences’. If those four
pixel values are similar, the corresponding wavelet coefficients will essentially
be 0. The averaging and differencing is repeated recursively on the mean
values, to capture large-scale redundancy. For smooth regions of the image,
most wavelet coefficients will be almost 0. Fine-scale wavelet coefficients are
needed around edges, and in areas with rapid variations. A few large-scale
coefficients will take care of slow variations in the image. For images without
too much variation, a few wavelet coefficients contain the relevant information
in the image. For images with e.g. texture, such as the fingerprint images, a
wavelet packet transform might be more appropriate.
Most of the compression is achieved in the first filtering steps of the
wavelet transform. Therefore, in wavelet image compression, the filter bank
is usually iterated only a few times, say, 4 or 5. A smoother wavelet than the
Haar wavelet is used, since compression with Haar wavelets leads to blocking
artifacts; rectangular patterns appear in the reconstructed image. The choice
of an optimal wavelet basis is an open problem, since there are many aspects
to take into account.
First, we want synthesis scaling functions and wavelets to be smooth.
At the same time, smoothness increases the filter length, and thus also the
10.1. IMAGE COMPRESSION 163
support width of scaling functions and wavelets. Too long synthesis filters
will give rise to ringing artifacts around edges. Further, we want all filters
to be symmetric.
Another problem associated with wavelet image compression is border
artifacts. The wavelet transform assumes that f (x, y) is defined in the whole
plane, and therefore the image need to be extended outside the borders.
There are three extensions in practice: zero-padding, periodic extension, or
symmetric extension. Zero-padding defines the image to be zero outside the
borders. After compression, this will have a ’darkening’ influence on the
image near the border. Periodic extension assumes that the image extends
periodically outside the borders. Unless the grey-scale values at the left
border matches those at the right border etc., periodic extension will induce
discontinuities at the borders, which again lead to compression artifacts.
Generally, the best method is to use symmetric extension, which gives a
continuous extension at the borders, and no compression artifacts appears.
Symmetric extension requires symmetric filters. An alternative is to use so
called boundary corrected wavelets. We will discuss the boundary problem
further in Chapter 15.
Another possible image transform is the classical Fourier transform. If
the signal is smooth, the Fourier coefficients decay very fast towards high
frequencies, and the image can be represented using a fairly small number
of low frequency coefficients. However, the presence of a single edge will
cause the Fourier coefficients to decay very slowly, and compression no longer
becomes possible. A way to get around this is to use a windowed Fourier
transform instead. This is basically the JPEG algorithm, where the image
is divided into blocks of 8 × 8 pixels, and a cosine transform is applied to
each block. For blocks containing no edges, high frequency coefficients will
be almost zero.
For high compression ratios, blocking artifacts appear in the JPEG algo-
rithm, that is, the 8 × 8 blocks become visible in the reconstructed image.
With properly choosen wavelets, wavelet image commpression works better.
For moderate compression ratios, e.g. 1 : 10, the performance of JPEG is
comparable to wavelets.
6
r2
r1
-
−d2 −d1 d1 d2
−r1
−r2
Video Compression
A video signal is a sequence of images fi (x, y). Each second contains approx-
imately 30 images, so the amount of information is huge. In order to transfer
video signals over the internet or telephone lines (video conferencing), mas-
sive compression is necessary. The simplest approach to video compression
is to compress each image fi seperately. However, this method does not ex-
ploit temporal correlation in the video signal: adjacent frames tend to be
very similar. One way to do this is to treat the video signal as a 3D signal
f (x, y, t) and apply a three-dimensional wavelet transform.
Another method is to compute the difference images
∆fi = fi+1 − fi .
Together with an initial image f0 these difference images contain the informa-
tion necessary to reconstruct the video signal. The difference images contain
the changes between adjacent frames. For parts of the images without move-
ment, the difference images will be zero, and thus we have already achieved
significant compression. Further compression is obtained by exploiting spa-
tial redundancy in the difference images, and applying a two-dimensional
wavelet transform W to all ∆fi and to f0 . The transformed images W ∆fi
and W f0 are quantized and encoded, and then transmitted/stored.
To reconstruct the video signal, inverse wavelet transforms are computed
to get back approximate ∆f c i and fb0 . From there, we can recover the video
signal approximately by
fbi+1 = fbi + ∆f
c .
i
The sparse structure of ∆fi can be used to speed up the inverse wavelet
transforms. For wavelets with compact support, each scaling and wavelet
coefficient only affect a small region of the image. Thus we only need to
compute pixels in ∆fi corresponding to non-zero scaling and wavelet coeffi-
cients.
The difference scheme just described is a special case of motion estima-
tion, where we try to predict a frame fi from M previous frames
and apply the wavelet transform to the prediction errors ∆fi = fbi − fi . The
predictor P tries to discover motions in the video to make an accurate guess
about the next frame. The wavelet transform is well suited for this, since it
contains local information about the images.
166 CHAPTER 10. COMPRESSION AND NOISE REDUCTION
10.2 Denoising
Suppose that a signal f (t) is sampled on the unit interval [0, 1] at the points
tk = 2−J k, k = 1, . . . , K = 2J . Denote the exact sample values by fk = f (tk ).
Assume that we only have noisy mearuments of fk , i.e., we have data yk =
fk + σzk . Here, zk is assumed to be Gaussian white noise, i.e., independent
normally distributed random variables with mean 0 and variance 1. The
parameter σ is the noise level, which is generally unknown, and has to be
estimated from the data.
We want to recover f from the noisy data. Applying an orthogonal dis-
crete wavelet transform W , yields
W y = W f + σW z
γj,k = wj,k + σe
zj,k ,
p=100
p=10
p=4
p=1
wavelet coefficients γj,k that mostly contain noise, and extract the test signal
by keeping large coefficients. Note the similarity with the crude wavelet
image compression algorithm.
5 5
0 0
−5 −5
−10 −10
−15 −15
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
−4 −4
−5 −5
−6 −6
−7 −7
−8 −8
−9 −9
−10 −10
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Figure 10.6: The HeavySine test function and its wavelet coefficients, with
and without noise.
Coefficients with absolute value less than some threshold T are shrunk to 0,
and all other coefficients are left unaltered. The threshold need to be properly
choosen, depending on the noise level σ. There is also a soft thresholding,
170 CHAPTER 10. COMPRESSION AND NOISE REDUCTION
2 2
0 0
−2 −2
−4 −4
−4 −2 0 2 4 −4 −2 0 2 4
sbj0 ,k = λj0 ,k ,
wbj,k = ηT (γj,k ).
5 5
0 0
−5 −5
−10 −10
−15 −15
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Denoising with soft thresholding minimizes the risk under the constraint that
fb should with high propability be at least as smooth as f . We will not make
this more precise. Instead, we just mention that hard thresholding generally
leads to smaller mean squared error than soft thresholding, but that the
estimator fb is not so smooth.
In a practical application, the threshold has to be fine-tuned to the par-
ticular class of signals under consideration. Also, the noise level is in general
unknown, and has to be estimated from the data. This is done using the
wavelet coefficients on the finest scale, since the influence from the signal f
is usually less there. The noise level is estimated as
b = Median(|γJ−1,k |)/.6745.
σ
The reason for using a median estimate is to reduce the influence of outliers,
i.e., noisy wavelet coefficients with a large signal content. The same estimate
can be used for noise reduction with coloured noise. In that case, scale-
dependent thresholds are chosen as
p
Tj = 2 log K Median(|γj,k |)/.6745.
The wavelet coefficients at each scale are then thresholded according to these
thresholds.
172 CHAPTER 10. COMPRESSION AND NOISE REDUCTION
10.3 Notes
The overview article by Jawerth et al. [17] is a good starting point for further
reading about wavelet image and video compression. The book by Nguyen
& Strang [27] contains a more detailed discussion about various aspects of
image compression, such as quantization, entropy coding, border extension,
and filter design. Also, comparisons between wavelet image compression and
the JPEG compression algorithm can be found.
The overview article [13] by Donoho contains a comprehensive description
of wavelet denoising algorithms. It also includes several numerical examples,
both for synthetic and real-world signals.
Chapter 11
In this chapter we will study the numerical solution of linear equations. Typ-
ically, we are given a linear operator T and a function f , and we seek the
solution u of the equation
T u = f.
The linear operator T is either a differential or integral operator. Discretiza-
tion of such an equation leads to a system of linear equations with a large
number of unknowns. The discretization normally starts at some coarse scale,
or grid, which is successively refined giving a sequence of linear equations de-
noted as
Tj uj = fj .
Here Tj is a matrix, or equivalently, an operator on a suitable finite-dimensio-
nal space. For differential equations this matrix is a sparse and ill-conditio-
ned, and for integral equations the matrix is full and, depending on the
operator, sometimes ill-conditioned. Today these equations are solved us-
ing various iterative methods. The most efficient methods are multilevel or
multigrid methods. These take advantage of the fact that we have a sequence
of scales, or operators, and are also quite simple. Recently it has been pro-
posed to use wavelets for the solution of both linear and nonlinear equations.
We will describe the so called non-standard form of an operator in a wavelet
basis, and see how it relates to standard multilevel methods.
173
174 CHAPTER 11. FAST NUMERICAL LINEAR ALGEBRA
The kernel is called the logarithmic potential and is more commonly defined
on a curve in the plane.
11.2 Discretization
We will consider the Galerkin method for the discretization of linear operator
equations. There exists several other discretization methods, such as the
finite difference method for differential equations and the collocation method
for integral equations. These can be seen as special cases of the Galerkin
method, though, with certain choices of quadrature formulas and function
spaces. In any case, wavelet and other multilevel methods work in the same
way.
Now, assume that the operator T : V → V , where V is some Hilbert
space such as L2 (R). The equation T u = f is then equivalent to finding a
u ∈ V such that
N
X
uj = ak ϕk ,
k=1
11.2. DISCRETIZATION 175
for some numbers (ak ). If we substitute this expression into equation (11.3),
and use the fact that (ϕk )N
k=1 is a basis of Vj , equation (11.3) is equivalent to
N
X
ak hT ϕk , ϕn i = hf, ϕn i, for n = 1, . . . , N.
k=1
Tj uj = fj ,
(Tj )n,k = hT ϕk , ϕn i,
and where uj and fj are vectors with components an and hf, ϕn i, respectively.
Throughout this chapter we will, with a slight abuse of notation, use the
the same symbol Tj to denote both a matrix and an operator on Vj . Similarly,
we use the notation uj to denote both a vector with components ak , and the
P
corresponding function uj = N k=1 ak ϕk .
Example 11.1. For the differential equation (11.1) the elements of the ma-
trix Tj are (by partial integration)
Z 1
(Tj )n,k = ϕ′k (x)ϕ′n (x) dx,
0
Example 11.2. For the integral equation (11.2) the elements of the matrix
Tj are given by
Z 1 Z 1
(Tj )n,k = − ϕk (x)ϕn (x) log |x − y| dxdy,
0 0
176 CHAPTER 11. FAST NUMERICAL LINEAR ALGEBRA
There are several natural choices of the finite dimensional spaces Vj for this
equation, and the simplest is to let Vj be the space of piecewise constant
functions on a grid with Nj = 2j intervals. The basis functions spanning
this space are the box functions. In this case the matrix Tj is full because
of the coupling factor log |x − y| in the integrand. The condition number of
the matrix is also large (since the continuous operator T is compact).
TJ uJ = fJ .
Recall that this linear system was equivalent to the set of equations
NJ
X
uJ,k hT ϕJ,k , ϕJ,n i = hf, ϕJ,n i, for n = 1, . . . , NJ ,
k=1
P J
where uJ = N k=1 uJ,k ϕJ,k . From the subspace splitting VJ = VJ−1 ⊕ WJ−1
we can also write uJ as
NJ −1 NJ −1
X X
uJ = uJ−1,k ϕJ−1,k + wJ−1,k ψJ−1,k ,
k=1 k=1
NJ −1
for n = 1, . . . , NJ−1 . This follows since the scaling functions (ϕJ−1,k )k=1 is
NJ −1
a basis of VJ−1 and the wavelets (ψJ−1,k )k=1 is a basis of WJ−1 . We write
11.4. THE STANDARD FORM 177
j N
We know that for an MRA the scaling functions (ϕj,k )k=1 span the Vj -spaces
Nj
and the wavelets (ψj,k )k=1 span the Wj -spaces. By a change of basis the
equation TJ uJ = fJ on VJ can be written as the following block matrix
system
TL CL,L ... CL,J−1 ujL fL
BL,L AL,L ...
AL,J−1
wLj dL
.. .. .. .. =
.
.. ..
. . . . . .
j
BJ−1,L AJ−1,L . . . AJ−1,J−1 wJ−1 dJ−1
The standard form does not fully utilize the hierarchical structure of a mul-
tiresolution analysis. Therefore we will not consider it any further in this
chapter.
11.5 Compression
So far we have said nothing about the structure of the Aj , Bj , and Cj matri-
ces. Since wavelets have vanishing moments it turns out that these matrices
are sparse. More specifically, for one-dimensional problems they have a rapid
off-diagonal decay. In the case the Tj matrices are ill-conditioned the Aj ma-
trices are even well conditioned, making the non-standard form a suitable
representation for iterative methods.
Let us now prove that integral operators produce sparse matrices in the
non-standard form. Start with an integral operator with kernel K(x, y)
Z
T u(x) = K(x, y)u(y) dy.
For the moment we assume that the kernel K(x, y) is smooth away from the
diagonal x = y, where it is singular. Typical examples of such kernels are
the following
For simplicity we will use the Haar wavelet basis, which has one vanishing
moment, that is,
Z
ψj,k (x) dx = 0.
The support of the Haar wavelet ψj,k , as well as the scaling function ϕj,k ,
is the interval Ij,k = [2−j k, 2−j (k + 1)]. Now, consider the elements of the
matrix Bj (the Aj and Cj matrices are treated similarly)
Z Z
(Bj )n,k = K(x, y)ψj,n(x)ϕj,k (y) dxdy.
Ij,k Ij,n
Since the Haar wavelet has one vanishing moment it follows that
Z Z
K(x, y)ψj,n (x) dx = ∂x K(ξ, y)xψj,n(x) dx,
Ij,n Ij,n
R
where C = xψj,n (x) dx and |Ij,n | = 2−j . This gives us an estimate for the
size of the elements of Bj
Z
−j
|(Bj )n,k | ≤ C2 max |∂x K(x, y)| dy
Ij,k x∈Ij,n
To proceed from here we need to know the off-diagonal decay of the kernel.
For the logarithmic potential we have
Exactly the same estimate will hold for the elements of Cj . For the matrix
Aj we can also make a Taylor expansion of the kernel in the y-variable, giving
an even faster off-diagonal decay of its elements
For other integral operators similar estimates for the off-diagonal decay
will hold. Also, the number of vanishing moments of the wavelet will in-
crease the degree of the decay. As a matter of fact, there is a large class of
integral operator, referred to as Calderón-Zygmund operators, for which one
can prove a general estimate. A Calderón-Zygmund operator is a bounded
integral operator on L2 (R), with a kernel satisfying the estimates
TJ−2 BJ−2
BJ−1
CJ−2 AJ−2
CJ−1 AJ−1
j N κ(Tj ) κ(Aj )
5 32 57 1.94
6 64 115 1.97
7 128 230 1.98
8 256 460 1.99
Inspired by the multigrid method, we begin by solving the first equation for
wJ−1 approximately using a simple smoother. This should work well since
AJ−1 is well conditioned. Next, we update the right-hand side of the second
equation and solve it for uJ−1 . Now, since TJ−1 is still ill conditioned we solve
this equation recursively by splitting TJ−1 one step further. When we have
reached a sufficiently coarse-scale operator TL , we solve the equation for uL
exactly though. Finally, we update the right-hand side of the first equation
and repeat the above steps a number of times.
Based on this we propose the following recursive algorithm, see Figure
11.2. The number of loops K is a small number, typically less than 5. The
182 CHAPTER 11. FAST NUMERICAL LINEAR ALGEBRA
(0)
function uj = Solve(uj , fj )
if j = L
Solve Tj uj = fj using Gaussian elimination
else
(0) (0) (0)
Project uj onto Vj−1 and Wj−1 to get uj−1 and wj−1
Project fj onto Vj−1 and Wj−1 to get fj−1 and dj−1
for k = 1, . . . , K
(k) (k−1) (k−1)
uj−1 = Solve(uj−1 , fj−1 − Cj−1wj−1 )
(k) (k−1) (k)
wj−1 = Iter(wj−1 , dj−1 − Bj−1 uj−1)
end
(K) (K)
uj = uj−1 + wj−1
end
(0)
function Iter(wj , dj ) solves Aj wj = dj approximately using a simple itera-
(0)
tive method with initial vector wj .
11.7 Notes
The non-standard form of an operator was invented by Beylkin, Coifman,
and Rokhlin in [4], Fast Wavelet Transforms and Numerical Algorithms I.
It can be seen as a generalisation of the Fast Multipole Method (FMM)
for computing potential interactions by Greengard and Rokhlin, see A Fast
Algorithm for Particle Simulations [15]. Methods for solving equations in the
non-standard form has further been developed primarily by Beylkin, see for
example Wavelets and Fast Numerical Algorithms [3]. For an introduction to
multigrid and iterative methods we refer the reader to the book A Multigrid
Tutorial [5] by Briggs.
Chapter 12
Functional Analysis
Functional analysis in this chapter will mean the study of global differentia-
bility of functions expressed in terms of their derivatives being, say, square in-
tegrable (or belonging to certain Banach spaces). The corresponding wavelet
descriptions will be made in terms of orthogonal MRA-wavelets (Chapter 4).
Wavelet representations are also well suited to describe local differentia-
bility properties of functions. However, for this matter, we will only give
references in Notes at the end of the chapter.
183
184 CHAPTER 12. FUNCTIONAL ANALYSIS
We now come to the main result of this chapter. The following theorem
has an extension to Lp , 1 < p < ∞ cited below. However, the crucial lemma
is essentially the same for p = 2.
Theorem 12.1. Under the same assumptions on the function f now defined
on R, i.e., f ∈ L2 (R) with f (α) ∈ L2 (R), we will show that for the wavelet
coefficients wj,k (D N ψ ∈ L2 )
X
kf (α) k22 ∼ |2αj wj,k |2 (0 ≤ α ≤ N)
j,k
R
holds when xα ψ(x)dx = 0 for 0 ≤ α ≤ N. Here ∼ denotes that the
quotient of the two expressions is bounded from below and from above by
positive constants depending only on the wavelet ψ.
kD α f k22 ∼ kf k22
Proof of the theorem: P Apply the lemma P with g(x) = f (2−j x) to ob-
tain kD α f k22 ∼ 22jα k |w0,k |2 when f (x) = k w0,k 2
j/2
ψ(2j x − k). This
proves the theorem. (In general, only a summation over the dilation index,
j, remains and the corresponding subspaces are orthogonal.)
α
P α
Proof of the lemma:P Starting from D Pf (x) = k w0,k D ψ(x − k), note
that |D α f (x)|2 ≤ k |w0,k |2 |D α ψ(x − k)| k |D α ψ(x − k)| by the Cauchy-
Schwarz inequality. Integrating this yields
X X Z X
α 2 α 2
kD f k2 ≤ sup |D ψ(x − k)| |w0,k | |D α ψ(x − k)|dx ≤ C |w0,k |2 .
x
k k k
12.1. DIFFERENTIABILITY AND WAVELET REPRESENTATION 185
and
Z Z X
kf k22 = | D α f (y)Ψ(y − k)ψ(x − k)|2 dxdy ≤
k
≤ CkD α f k22
with their absolute values only. This means that the (L2 -)convergence of the
wavelet expansion is unconditional in the sense that it will not depend on,
for example, the order of summation or the signs (phases) of the coefficients.
This is in contrast to, for example, the basis einωt n for Lp (0, 2π), p 6= 2,
which is not unconditional.
Exercises 12.1
12.1. Formulate and prove the analogue in R of the identity 12.1.
P
12.2. Show that for f (x) = k w0,k ψ(x − k) with ψ as in Theorem 12.1
holds
kD α f kp ∼ kf kp (0 ≤ α ≤ N)
where the Cauchy-Schwarz inequality was applied for p = 2, and writing out
w0,k .
kf − fj k2 ≤ C 2−jα kD α f k2
12.2 Notes
For further material, we refer to the books by Meyer [23], Kahane & Lemarié
[21], Hernández & Weiss [16]. Wavelets and local regularity are treated in a
recent dissertation by Andersson [2], which also contains many references.
Chapter 13
An Analysis Tool
We will give two examples of how the continuous wavelet transform in Chap-
ter 7 may be used in signal processing, and also indicate an algorithmic short-
cut if the wavelet is associated with a multiresolution analysis; see Chapter
4.
Z ∞
t−b
Wψ f (a, b) = f (t)ψ |a|−1/2 dt
−∞ a
Z ∞
|ψ̂(ξ)|2 dξ/ξ < ∞
0
b
where now ψ(0) ≈ 0 only, and ω0 = 5.336. This choice of ω0 makes the real
part of ψ, Reψ, have its first maximum value outside the origin as half the
b = 0, a small correction may
the modulus value there. In order to make ψ(0)
1/2 2 2
be added, e.g., −(2π) exp(−ω /2 − ω0 /2).)
187
188 CHAPTER 13. AN ANALYSIS TOOL
3
1
2.5
0.5 2
0 1.5
1
−0.5
0.5
−1 0
−5 0 5 0 5 10
b
Figure 13.1: Left: Reψ(t) (dotted), Imψ(t) (dashed). Right: ψ(ω).
1 2
3
0.5
4
log2(1/a)
0 5
6
−0.5
7
−1 8
0 0.5 1 0 0.5 1
t b
Example 13.2. In Figures 13.3, 13.4, 13.5,2 the continuous wavelet trans-
form of three different signals are shown, which represent measurements in
the velocity field of a fluid. The wavelet is a complex Morlet wavelet shown
in Figure 13.1. The figures are to be viewed rotated a quarter turn clockwise
relative to their captions. The signals are shown at the top (sic! ) for easy
reference. On the horizontal axis is the parameter b and on the vertical a,
which may be viewed as time and frequency respectively (but are certainly
not that exactly).
1
The transform was calculated by Matlab, using the Morlet wavelet (Figure 13.2) in
Wavelab.
2
These three figures have been produced by C-F Stein, using a program written and
put to his disposal by M. Holschneider.
13.1. TWO EXAMPLES 189
13.3 Notes
A recent application of the continuous wavelet transform to the analysis of
data concerning the polar motion, the Chandler wobble, can be found in
Gibert et al. [14].
For further reading, we suggest the books by Holschneider [18], and Ka-
hane & Lemarié [21].
Chapter 14
Feature Extraction
For most signal and image classification problems the dimension of the input
signal is very large. For example, a typical segment of an audio signal has a
few thousand samples, and a common image size is 512 × 512 pixels. This
makes it practically impossible to directly employ a traditional classifier,
such as linear discriminant analysis (LDA), on the input signal. Therefore
one usually divides a classifier into two parts. First, one maps the input signal
into some lower dimensional space containing the most relevant features of
the different classes of possible input signals. Then, these features are fed
into a traditional classifier such as an LDA or an artificial neural net (ANN).
Most literature on classification focuses on the properties of the second step.
We will present a rather recent wavelet based technique for the first step,
that is, the feature extractor. This technique expands the input signal into
a large time-frequency dictionary consisting of, for instance, wavelet packets
and local trigonometric bases. It then finds, using the best-basis algorithm,
the one basis that best discriminates the different classes of input signals.
We refer to these bases as local discriminant bases.
193
194 CHAPTER 14. FEATURE EXTRACTION
(y)
number of class 1 and 2 signals, respectively. We use the notation xi to
denote that a signal in the training set belongs to class y. Let P (A, y) be
a probability distribution on X × Y, where A ⊂ X and y ∈ Y. This means
that,
P (A, y) = P (X ∈ A, Y = y) = πy P (X ∈ A|Y = y),
where X ∈ X and Y ∈ Y are random variables. Here πy is the prior prob-
ability of class y, which is usually estimated as πy = Ny /N, y = 1, 2. The
optimal classifier for this set up is the Bayes classifier. To obtain it we need
an estimate of P (A, y). The number of training samples is small compared to
the size of the input signals though, that is, N ≪ n. This makes it impossible
to reliably estimate P (A, y) in practice.
Feature extraction is the resolution to the problem with the high dimen-
sionality of the input signal space. It is essential to extract relevant features
of the signal for classification purposes. In practice, it is known that multi-
variate data in Rn are almost never n-dimensional. Rather, the data exhibit
an intrinsic lower dimensional structure. Hence, the following approach of
splitting the classifier into two functions is often taken:
d = g ◦ f.
Here, f : X → F is a feature extractor mapping the input signal space
into a lower dimensional feature space F ⊂ Rm . The dimension m of the
feature space is typically at least ten times smaller than the dimension n
of the input signal space. The feature extractor is followed by a traditional
classifier g : F → Y, which should work well if the different signal classes are
well separated in the feature space. We will describe an automatic procedure
for constructing the feature extractor given the training data.
where ψ1 , . . . , ψn are the basis vectors of the best basis. Applying the trans-
pose of this matrix to an input signal x, gives us the coordinates of the signal
in the best basis (we regard x and the ψi :s as column vectors in Rn )
t
ψ1 x hx, ψ1 i
W t x = ... = ... .
ψn t x hx, ψn i
d = g ◦ (Pm W t ),
W = argmax ∆(B).
B∈D
Zi = hX, ψi i.
196 CHAPTER 14. FEATURE EXTRACTION
(y)
Then Zi is also a random variable, and we sometimes write Zi to emphasize
that the random variable X is of class y. Now, we are interested in the
probability density function (pdf) of Zi for each class y, which we denote as
(y)
qi (z). We can estimate these pdfs by expanding the available signals in the
(y)
training set into the basis functions of the dictionary. An estimate qbi , for
(y)
qi , can then be computed by a pdf estimation technique called averaged
shifted histograms (ASH), for example.
(1) (2)
Once we have estimated the pdfs qbi and qbi , for class 1 and 2, we need
(1) (2)
a distance function δ(bqi , qbi ), that measures the ability of the direction ψi
to separate the two classes. If the two pdfs are similar δ should be close to
zero. The best direction is the one for which the two pdfs look most different
from one another. For this direction δ should attain a maximum positive
value. There are several ways to measure the discrepancy between two pdfs
of which we mention
Z p p
δ(p, q) = ( p(z) − q(z))2 dz (Hellinger distance)
Z 1/2
2
δ(p, q) = (p(z) − q(z)) dz (ℓ2 -distance)
Further let {δ(i) } be the decreasing rearrangement of {δi }, that is, the dis-
criminant powers sorted in decreasing order. The discriminant function for
the basis B is then finally defined as the sum of the k (< n) largest discrim-
inant powers
k
X
∆(B) = δ(i) .
i=1
1. Expand all the signals in the training set into the time-frequency dic-
tionary D.
(y)
2. Estimate the projected pdfs qbi for each basis vector ψi and class y.
All of these steps take no more than O(n log n) operations (see Chapter
9), making this algorithm computationally efficient.
14.5 Notes
Linear discriminant bases for classification purposes was introduced by Saito
and Coifman in the paper Local discriminant bases [25]. Improvements to
the LDB method were later described by the same authors in [26]. They have
performed successful experiments on geophysical acoustic waveform classifi-
cation, radar signal classification, and classification of neuron firing patterns
of monkeys.
198 CHAPTER 14. FEATURE EXTRACTION
Chapter 15
Implementation Issues
This chapter is concerned with two issues related to the actual implementa-
tion of the discrete wavelet transform. The first concerns finite length signals
and how to extend, or otherwise treat, them. The second, how to process
the sample values of a continuous-time function, before the discrete wavelet
transform is being implemented in a filter bank. (The discrete wavelet trans-
form was defined in Chapter 4.)
199
200 CHAPTER 15. IMPLEMENTATION ISSUES
Extension of Signals
We will describe three extension methods: extension by zeros (zero-padding),
extension by periodicity (wraparound), and extension by reflection (symmet-
ric extension). See Figure 15.1.
Zero-padding simply sets the rest of the values to zero, giving the infinite
sequence
For signals that are naturally periodic, wraparound is better than zero-
padding. The discrete Fourier transform (DFT) of a vector in RL is the convo-
lution, of the periodic extension, of the vector with the filter 1, W 1 , . . . , W L−1,
where W = e−i2π/L .
zero−padding wraparound symmetric extension
2 2 2
1 1 1
0 0 0
−1 −1 −1
−1 0 1 2 −1 0 1 2 −1 0 1 2
The two symmetric extension techniques give rise to two versions of the
discrete cosine transform (DCT). The DCT is used in the JPEG image com-
pression standard, for example.
In continuous time, symmetric extension gives a continuous function.
This is the advantage. It does introduce a jump in the first derivative, though.
Boundary-Corrected Filters
Boundary-corrected filters are quite complicated to construct. In continuous
time this corresponds to defining a wavelet basis for a finite interval [0, L].
Such construction are still at the research stage. Let us therefore describe
one example of boundary-corrected wavelets.
The example we will consider is the piecewise linear hat function as the
scaling function. It is supported on two intervals and the corresponding
wavelet on three intervals, see Figure 15.2. These are the synthesis functions.
The filters are 0.5, 1, 0.5, and 0.1, -0.6, 1, -0.6, 0.1, respectively. This is
an example of a semi-orthogonal wavelet basis. The approximation space
V0 is orthogonal to the detail space W0 , but the basis functions within those
spaces are not orthogonal; the hat function is not orthogonal to its translates.
This basis is useful when discretizing certain differential equations, using the
202 CHAPTER 15. IMPLEMENTATION ISSUES
V0 W0
1.5 1.5
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
0 1 2 3 0 1 2 3
Galerkin method. Then, one does not need to know the dual scaling function
and wavelet. See Chapter 11 for details on how to solve differential equations
using wavelets.
Finally, in Figure 15.3, we see two boundary-corrected wavelets. Depend-
ing on whether we have Dirichlet or Neumann conditions at the boundary of
the differential equation, we may want to force the boundary condition to 0.
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
0 1 2 3 0 1 2 3
Pre-filtering
Suppose we are given the sample values f (2−J k) of a lowpass filtered signal
f , with fˆ(ω) = 0 for |ω| ≥ 2J π. These sample values will be related to the
approximation of f at the scale 2−J , through the projection
X
fJ (t) := PJ f = hf, ϕJ,k iϕJ,k (t).
k
A relation appears if we compute the scaling coefficients sJ,k = hf, ϕJ,k i ap-
proximately with some numerical integration method, e.g., using a rectangle
approximation
Z ∞
sJ,k = f (t)ϕJ,k (t) dt
−∞
X
≈ 2−J f (2−J l)ϕJ,k (2−J l)
l
X
= 2−J/2 f (2−J l)ϕ(l − k).
l
Note that the last expression is a filtering of the samples of f , where the filter
coefficients are 2−J/2 ϕ(−l). This is called pre-filtering. There exists other
pre-filtering methods. If, for instance, f is band-limited, this can be taken
into account, to compute the scaling coefficients sJ,k more accurately. It is
common practice to use the sample values directly as the scaling coefficients,
which then introduces an error. This error will have its main influence at the
smallest scales, that is, for j < J close to J (see Exercise 15.1).
Post-filtering
The sample values can be reconstructed approximately, through a filtering
of the scaling coefficients sJ,k with the filter coefficients 2J/2 ϕ(k)
f (2−J k) ≈ fJ (2−J k)
X
= sJ,l ϕJ,l (2−J k)
l
X
J/2
=2 sJ,l ϕ(k − l).
l
Exercises 15.2
15.1. Suppose that we are given the sample values f (2−J k) of a lowpass
filtered signal f , with fb(ω) = 0 for |ω| ≥ 2J π. For ω in the pass-band, verify
that
X
fb(ω) = 2−J
−J
f (2−J k)e−i2 kω ,
k
equals
X
fbJ (ω) = 2−J/2 ϕ(2
−J kω
b −J ω) f (2−J k)e−i2 .
k
This indicates how a filter might be constructed to compensate for the influ-
ence of the scaling function.
15.3 Notes
The overview article of Jawerth and Sweldens [20] describes how to define
orthogonal wavelets on an interval. The book [27] by Nguyen and Strang
discusses finite length signals and also contains useful references for further
reading.
Bibliography
[3] G. Beylkin, Wavelets and fast numerical algorithms, Lecture Notes for
short course, AMS-93, Proceedings of Symposia in Applied Mathemat-
ics, vol. 47, 1993, pp. 89–117.
[7] C.K. Chui, Introduction to wavelets, New York: Academic Press, 1992.
205
206 BIBLIOGRAPHY
[29] W. Sweldens and P. Schröder, Building your own wavelets at home, Tech.
report, University of South Carolina, Katholieke Universiteit Leuven,
1995.
208
INDEX 209
subsampling lattice, 94
symmetric, 25
synthesis, 38
threshold, 169
time-frequency atom, 145
transfer function, 21
tree
wavelet packet, 149
upsampling, 38, 40
vanishing moment, 79
wavelet, 63
wavelet decomposition, 65
wavelet equation, 64
wavelet packet tree, 149
wavelet packets, 151
web, 14
white noise, 166
window function, 147, 156
z-transform, 20