DSP Application Notes
DSP Application Notes
Course Description
Builds on the fundamentals of Digital Signal Processing to show specic examples of signal
Content
24 lectures & associated homeworks and projects.
Texts
• Discrete-Time Signal Processing (2nd Ed.), Oppenheim and Schafer, Prentice-Hall,
1999.
Syllabus
1. Brief review of DSP
3. Adaptive ltering
4. Spectral analysis
5. Speech processing
1
Marking
• 15% - Three homework assignments, one every 3 weeks (after every 6 lectures), with
• 15% - For a Matlab based practical assignment due before the easter break. A choice
2
1 Brief review of DSP fundamentals
1.1 Nyquist's Sampling Theorem
The ideal sampling of a signal, f (t), is the same as multiplying it by an impulse train,
δT (t) = Σ∞
n=−∞ δ(t − nT ). The resulting signal is (c.f. Figure 1)
∞
X
δT = cn ejnω0 t ,
n=−∞
where,
Z π
ω0 ω0 T π
cn = δT (t)e−jω0 t dt, N ote : = .
2π −π 2 ω0
ω0
1
=
T
Therefore,
∞
1 X
f¯(t) = f (t)ejnω0 t .
T n=−∞
3
Take the Laplace transform of both sides gives
∞
1 X
F̄ (s) = F (s − jnω0 )
T n=−∞
∞
1 X
F̄ (jω) = F (j(ω − nω0 )) .
T n=−∞
Therefore the spectrum of f¯(t) consists of an innite number of copies of the spectrum of
f (t) shifted to be centred on the multiples of the sampling frequency and scaled by the
1
amount
T (c.f. Figure 2).
ω0
If the smallest frequency in f (t) is less than
2 then the signal may be perfectly re-
constructed, given an ideal analog lowpass lter. This is Nyquist's sampling theorem for
baseband signals. We can also show for bandpass signals that the signal only needs to be
1
Figure 2: Ideal sampling spectrum. The spectrum is scaled by
T and copied to multiples
of ω0 . An ideal analog reconstruction lter is shown (dashed).
4
1.2 Discrete-Time Fourier Transform (DTFT)
The continuous-time Fourier Transform of a signal, f (t), is dened as
Z ∞
F (jω) = f (t)e−jωt dt
−∞
If f (t) is sampled by impulses, then we are working with the signal f¯(t):
∞
X
f¯(t) = f (nT )δ(t − nT ).
n=−∞
Z ∞
F (jω) = f¯(t)e−jωt dt
−∞
Z ∞ ∞
X
= f (nT )δ(t − nT )e−jωt dt
−∞ n=−∞
∞
X Z ∞
= f (nT ) δ(t − nT )e−jωt dt
n=−∞ −∞
X∞
= f (nT )e−jnωT
n=−∞
F (jω) denotes the spectrum of the sampled version of f (t). The transform is called the
5
1.3 z -transform
Let f [n] be a sequence obtained by sampling the signal f (t) every T seconds (i.e. at
∞
X
F̃ (z) = f [n]z −n .
n=−∞
z is a complex variable. We note that if z = ejωT then this transform is identical to the
DTFT (i.e., F (jω) = F̃ (ejω )). Also note that as ω varies, z = ejωT traces out the locus of
Figure 3: Unit circle in the z -plane traced out by the function z = ejωT
6
Example: Find the DTFT magnitude and phase given H̃(z)
The transfer function of a digital lter is given by the z -function, H̃(z), nd the magnitude
and phase response.
H̃(z) = 1 + z −1
H̃(ejωT ) = 1 + e−jωT
= 1 + cos(ωT ) − j sin(ωT )
q
H̃(ejωT ) = (1 + cos(ωT ))2 + sin2 (ωT )
√ p
= 2 1 + cos(ωT )
− sin(ωT )
∠H̃(ejωT ) = arctan
1 + cos(ωT )
!
− sin(2[ ωT
2 ])
= arctan
1 + cos(2[ ωT2 ])
!
−2 sin( ωT
2 ) cos( ωT
2 )
= arctan
2 cos2 ( ωT
2 )
!
ωT
− sin( 2 )
= arctan
cos( ωT
2 )
ωT
= −
2
7
1.4 Inverse z -transform
Ocial version:
F̃ (z)z n
I
1
f [n] = dz,
2πj C z
where C is an appropriately chosen contour in the region of convergence of the z -plane.
However, in practice there are many simple tricks for nding the inverse z -transform.
1
H̃(z) =
z(z − 1)(2z − 1)
Using partial fraction expansion we get:
z 2z
H̃(z) = z −1 1 − −
z − 1 z − 0.5
where u(n) is a unit step and δ(n) is a delta impulse. Alternatively we could use long
Given a function, H̃(z), the sequence obtained by taking the inverse z -transform, h[n],
is bounded only if the poles of the function lie inside the unit circle. If the function
H̃(z) describes the frequency response of a lter, this sequence h[n] is called the impulse
response.
8
2 Properties and Design of Filters
2.1 Frequency Transfer Functions
The frequency response of a linear digital lter may be represented by the transfer function
H̃(z). Suppose we know the z -transform of the input signal, x[n], is X̃(z). Therefore, we
can nd the output of the lter, y[n], since its z -transform is dened as
Ỹ (z) = H̃(z)X̃(z).
The transfer function is usually given as, or can be reduced to, the ratio of two polynomials.
For example,
Rearranging once more gives a linear constant coecient dierence equation which can be
1
y[n] = [(b0 x[n] + b1 x[n − 1] + ... + bM x[n − M ]) − (a1 y[n − 1] + ... + aN y[n − N ])] .
a0
In a nite impulse response (FIR) lter all ai = 0 for i>0 there is no feedback. FIR
lters are always stable. An innite impulse reponse (IIR) lter is unstable if the poles
(where the denominator is zero) are outside the unit circle dened by |z| = 1, i.e., z = ejωT .
We will see later that, when using nite precision arithmetic, it is sometimes dicult to
9
2.2 Pole-Zero plots
Another way to examine the transfer function is factorise the numerator and denominator
and see where the poles and zeros lie in the complex z -plane. For example we would
QM −1
B̃(z) b0 + b1 z −1 + b2 z −2 + ... + bM z −M i=1 1 − zi z
H̃(z) = = = QN .
Ã(z) a0 + a1 z −1 + a2 z −2 + ... + aN z −N −1
i=1 (1 − pi z )
z −M M
Q
i=1 (z − zi )
H̃(z) = −N QN .
z i=1 (z − pi )
Now if we want to know the steady state frequency response of the lter we set z = ejωT ,
−jωT M
QM jωT − z
e i=1 e i
H̃(ejωT ) = −jωT N QN .
e i=1 (e
jωT − p )
i
Next nd the magnitude spectrum, H̄(jω) = H̃(ejωT ). (remember |ejθ | = 1)
In words, the magnitude response of the lter at a frequency ω is the product of the
distances from e
jωT to each of the zeros, divided by the product of the distances of ejωT
to each of the poles.
10
Example: Sketch the magnitude response of h[n] = [1, −0.25, −0.125]
Use a pole-zero plot to sketch the magnitude of the frequency response of the lter whose
H̃(ejωT ) = ejωT − 0.5 ejωT + 0.25
= L1 .L2
Figure 4: An example of a pole-zero plot with two real zeros at z = −0.25 and z = 0.5.
The magnitude of the frequency response at z= ejωT is the product of L1 and L2 .
on the unit circle. Figure 5 shows a plot of the magnitude response of the lter, h[n] =
11
[1, −0.25, −0.125].
1 5 5
ωT = 0, L1 = , L2 = ⇒ H̃(ejωT ) =
2s 4 8s
2 √ 2 √ √85
π 1 2
5 1 2
17
jωT
ωT = , L1 = + (1) = , L2 = + (1) = ⇒ H̃(e ) =
2 2 2 4 4 8
3 3
9
ωT = π, L1 = , L2 = ⇒ H̃(ejωT ) =
2 4 8
1.3
1.2
1.1
1
Magnitude
0.9
0.8
0.7
Figure 5: The magnitude of the frequency response of the lter H̃(z) = 1 − 0.25z −1 −
0.125z −2
It is sometimes intuitively helpful to think of the zeros as depressions in the z -plane.
The magnitude of the frequency response is then the height of the absolute value of the
12
1
0.5
0
log10(|H(z)|)
−0.5
1.5
−1
1
−1.5 0.5
−2 0
−1.5 −0.5
−1
−0.5
0 −1
0.5
1 −1.5 Im[z]
1.5
Re[z]
13
Example: Place a spectral null at ωT = 2π
3
2π
We need a zero on the unit cirlce at z1 = ej 3 . However if we want the lter coecients
4π
to be real we must put a zero at the complex conjugate position, z2 = ej 3 . This gives the
2π
4π
H̃(z) = z − ej 3 z − ej 3
2π
4π
H̃(ejωT ) = ejωT − ej 3 ejωT − ej 3 = L1 L2
or
This is essentially a low pass lter. We would have expected as much since the output
14
of the lter, y[n], is the sum of the present input and two previous inputs:
0
log (|H(z)|)
−1
10
−2 1.5
1
−3
−1.5 0.5
−1
0
−0.5
0 −0.5
0.5
−1
1
−1.5
Im[z]
1.5
Re[z]
15
Example: Implementing an ad-hoc notch lter at ωT = 2π
3
Suppose we want to augment the lter above by adding a pole, we must keep the pole
inside the unit circle and we want to place it as near as possible to the zero. Let's place
2π
it at z = p1 = 0.95ej 3 . But we must also place one at the complex conjugate position to
4π
keep the lter coecients real: z = p2 = 0.95ej 3 . Now we have a new lter,
2π
4π
z − ej 3 z − ej 3
H̃(z) = 2π
4π
z−j 3 z − 0.95ej 3
1 + z −1 + z −2
=
1 + (0.95)z −1 + (0.95)2 z −2
j 2π j 4π
z − e z − e
3
3
jωT
H̃(e ) =
2π 4π
z − 0.95ej 3 z − 0.95ej 3
L1 L2
= .
M 1 M2
Notice that when when z is far away from a pole-zero pair, the absolute value of their
2π
z − ej 3
R̃(z) = '1
2π
z − 0.95ej 3
2π
for z far away from ej 3 . This is exactly what we want, a zero at the specied frequency
and approximately unity gain elsewhere. We can visualise this using the pole-zero plot in
Figure 9. Figure 10 shows the function H̃(z) evaluated over the z -plane. Again, we nd
the magnitude of the frequency spectrum, H̃(e
jωT ), by circumnavigating the unit circle,
z = ejωT . A plot of H̃(ejωT ) is shown in Figure 11.
We have seen how moving zeros and poles around the z -plane allows us to approximately
design lters with a desired frequency response. Iterative computer based techniques do
this when trying to nd design a lter with an arbitrary frequency response they have a
guess and then move the poles and zeros to improve their guess. One famous algorithm is
16
2π
Figure 9: Pole-Zero plot for a notch lter at ωT = 3 .
0.5
0
log10(|H(z)|)
−0.5
1.5
−1
1
−1.5
0.5
−2 0
−1.5 −0.5
−1
−0.5
0 −1
0.5
1 −1.5 Im[z]
1.5
Re[z]
1+z −1 +z −2
Figure 10: The log-magnitude of the function H̃(z) = 1+0.95z −1 +(0.95)2 z −2
evaluated over
the z -plane. Also shown is the unit circle, z = ejωT . We see that the poles `pull' the
surface back up in the vicinity of the zeros.
17
1.4
1.2
0.8
Magnitude
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ωT (π)
Figure 11: The magnitude spectrum ,
H̃(ejωT ), of the notch lter.
1 1 − pz −1
Ã(z) = − .
p 1 − 1∗ z −1
p
1
This lter has a zero at z = p and a pole at z = p∗ , where ∗ denotes the complex conjugate.
We can examine what the magnitude of the frequency response is:
− p1 + z −1
Ã(z) =
1 − p1∗ z −1
jωT
− p1 + e−jωT
Ã(e ) =
1 − p1∗ e−jωT
e−jωT 1 − p1 ejωT
=
1 − p1∗ e−jωT
−jωT b
= e
b∗
18
jωT
Ã(e ) = 1
The magnitude response of the lter at all frequencies is unity. So what! Well suppose
we have designed a lter which has a pole, p, outside the unit circle. This lter will be
unstable. However, the allpass lter removes that pole by placing a zero there, and a new
1
pole is placed at the position
p∗ (the distance from the origin is now the inverse of the
1
original distance). There is also a − factor to ensure the gain is one. We can make any
p
IIR lter stable with the same magnitude response. Unfortunately the phase is altered.
19
Example:
Make the lter with the following transfer function stable:
1 1
H̃(z) = = .
(1 − (1 − j)z −1 ) (1 − (1 + j)z −1 ) 1− 2z −1 + 2z −2
0
Magnitude (dB)
−5
−10
−15
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ωT (π)
400
350
300
Phase (degrees)
250
200
150
100
50
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ωT (π)
Figure 12: The magnitude and frequency response of the unstable lter H̃(z) =
1 1
(1−(1−j)z −1 )(1−(1+j)z −1 )
= 1−2z −1 +2z −2 .
This lter has poles at z = (1 − j) and z = (1 + j). We transform one pole at a time
−1 1 −1 1
H̃Stable (z) =
(1 − j) 1 − 1 z −1 (1 + j) 1 − 1 z −1
1+j 1−j
1 1
=
2 1 − 2(0.5)z −1 + 12 z −2
1
=
2 − 2z −1 + z −2
This stable lter has poles at z = 0.5 ± j0.5. Since they are inside the unit circle the lter
is stable.
20
5
Magnitude (dB)
−5
−10
−15
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ωT (π)
−10
−20
Phase (degrees)
−30
−40
−50
−60
−70
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ωT (π)
Figure 13: The magnitude and frequency response of the stablised lter H̃Stable (z) =
1
. The magnitude response is the same but the phase response is dierent.
2−2z −1 +z −2
on analog lter design, so we can steal some of their ideas and make them work in the
digital domain. Before we see how to make them work in the digital domain, let's revise
1
|H(jω)|2 = 2n
ω
1+ ωc
We are usually given a power transfer specication which we need to meet using this
transfer function.
1
2n ≥ Gp
ωp
1+ ωc
2n
ωp 1
≤ −1 (1)
ωc Gp
21
Figure 14: The typical specication of a Butterworth lowpass analog lter.
and
1
2n ≤ Gs
ωs
1+ ωc
2n
ωs 1
≥ −1 (2)
ωc Gs
1
log Gs − 1 − log G1p − 1
n≥ .
2log ωωps
ωp ωs
1 ≤ ωc ≤ 1 .
1 2n 1 2n
Gp −1 Gs −1
22
Example: Design an analog Butterworth to meet the spec
ωp = 0.726
Gp = 0.8
ωs = 1.376
Gs = 10−2
We start by choosing n:
ωp 0.726
ωc ≥ 1 = 1 = 0.8339
(0.25) 10
1 2n
Gp −1
ωs 1.376
ωc ≤ 1 = 1 = 0.869
1
−1
2n (99) 10
Gs
We look up the lter tables and see that the transfer function for a 5th order analog
1
H(s) = 5 4 3 2 .
s s s s s
ωc + 3.2361 ωc + 5.2361 ωc + 5.2361 ωc + 3.2361 ωc +1
23
2.4.2 Bilinear transform
We have just revised how to design an analog lter (choose the parameters) given a speci-
cation. But how do we use this design technique to build a digital lter? One commonly
used technique is the Bilinear Transform. We take a transfer function for an analog lter
QM
i=1 (s − zi )
HA (s) = QN
i=1 (s − pi )
1 − z −1
s→ ,
1 + z −1
1 − z −1
H̃D (z) = HA
1 + z −1
The frequency response of the analog lter, HA (s), is given by setting s = jωA , the
frequency response of the digital lter, H̃(z), is given by setting z = ejω D T (ωA denotes
analog frequency, ωD denotes digital frequency). So the digital and analog lters have the
1 − z −1
s =
1 + z −1
1 − e−jωD T
jωA = −jωD T
1 + e
jωD T
= tanh
2
ωD T
= j tan
2
ωD T
ωA = tan
2
This squeezes the entire frequency range of the analog lter, ωA , into the range [0, π] of the
normalised digital frequency ωD T . So if we are given the design specication of a digital
lter we:
1. prewarp the specied ωD frequencies to get the specs for the analog lter: ωA =
ωD T
tan 2
24
Example: Design a 5th order lowpass butterworth digital lter using Bilinear
Transform
The sampling time is T = 10−3 seconds. The digital lter specication is
ωDp = 2π(200)
GDp = 0.8
ωDs = 2π(300)
GDs = 10−2
Prewarping gives
2(200)π(10−3 )
ωAp = tan = 0.726
2
GAp = 0.8
2(300)π(10−3 )
ωAs = tan = 1.376
2
GAs = 10−2
These are the same specications for the butterworth lter we designed earlier with n=5
and ωc = 0.85. We look up the lter tables and see that the transfer function for a 5th
order analog butterworth lowpass lter is
1
H(s) = 5 4 3 2 .
s s s s s
ωc + 3.2361 ωc + 5.2361 ωc + 5.2361 ωc + 3.2361 ωc +1
1
H̃(z) = 5 4 3 2 .
1 1−z −1 1 1−z −1 1 1−z −1 1 1−z −1 1 1−z −1
0.85 1+z −1
+ 3.2361 0.85 1+z −1
+ 5.2361 0.85 1+z −1
+ 5.2361 0.85 1+z −1
+ 3.2361 0.85 1+z −1
+1
From here it is trivial (if a little soul-destroying) to determine the lter coecients. Hint:
5
multiply top and bottom by 0.85 1 + z −1 and then thresh it out. Figure 15 shows the
25
1.4
1.2
1
ωT (π): 0.3997803
Magnitude: 0.9091107
0.8
Magnitude
0.6
0.4
0.2
ωT (π): 0.6013184
Magnitude: 0.08687025
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ωT (π)
Figure 15: The magnitude of the frequency reponse of a 5th order digital butterworth
constructed using the bilinear transform. Notice that the response is zero at ωT = π .
Marked are pass and stop band crital frequencies ωDp andωDs which occur at f = 200Hz
and f = 300Hz respectively when the sample time is T = 10−3 .
• In an ideal world we would have a train of impulses, which represent the sampled
signal, and we would pass the impulse train through an ideal lowpass lter.
• The ideal lter would only pass the portion of the spectrum related to the original
signal and would block all copies of the spectrum which appeared at multiples of the
26
1. It is impossible, in practice, to produce an impulse train. It is very, very dicult to
2. Even if we could, it is impossible to make an ideal lter but his is not a huge
pass lter. The digital-to-analog (DAC) converter encorperates a zero-order hold mecha-
When the impulse response is convolved with ideally sampled signal a staircase approx-
From your electronic circuits course you should remember that an ADC can be imple-
mented with an opamp and some resistors (c.f. Figure 19). The staircase occurs because
every T seconds a new sample is read from the computer memory to the input of the ADC.
We would expect that lowpass ltering this staircase approximation might give us a
better reconstruction of the signal. We can justify this assumption by looking at how the
27
Figure 19: A four-bit analog-to-digital converter.
• We can nd the transfer function of the ADC by taking the Fourier transform of the
impulse response:
Z∞
F {h(t)} = H(jω) = h(t)e−jωt dt
−∞
ZT
= e−jωt dt
0
t=T
e−jωt
=
−jω t=0
e−jωT − e0
=
−jω
jωT
e− 2 h − jωT jωT
i
= e 2 −e 2
−jω
jωT
" jωT jωT
#
2e− 2 e 2 − e− 2
=
ω 2j
jωT
2e− 2
ωT
= sin
ω 2
ωT
− jωT sin 2
= Te 2 ωT
2
ωT
sin 2
|H(jω)| = T ωT
.
2
28
In the limit, as x → 0, sin(x)/x → 1. For x = ±nπ , sin(x)/x = 0. Therefore the ADC
ωT
= ±nπ
2
ωT = ±n2π
2π
ω = ±n
T
= ±nωs
Remember ωs is the sampling frequency! But also remember that when the original signal
was ideally sampled there were an innite number of copies of the signal spectrum placed
1
at ±nωs and scaled by
T . So the ADC has scaled the signal back to its original magnitiude
and placed a null at the centre of each copy of the spectrum. Let's sketch the spectrum of
|X(jω)|
|X̄(jω)|
1
T
|H(jω)|
T
|X̄(jω)H(jω)|
Figure 20: Plots of (a) signal spectrum, (b) spectrum of ideally sampled signal, (c) transfer
function of ADC and (d) spectrum of staircase approximation of signal.
• We can now design an analog lowpass lter to remove the residues at ±nωs and hence
29
very closely reconstruct the original signal.
• The more over-sampled the signal is the less sharp the cut-o of the lowpass lter
needs to be since the space between the copies of the spectra is increased.
• Also, if the signal is oversampled there will be less shaping of the signal spectrum
which we are trying to recover since the sin(x)/x will be approxiately at over a
30
2.6 Interpolation (Upsampling)
By interpolation we mean increasing the sampling rate by an interger number. Suppose we
want to increase the sampling rate L times. We will now have L−1 new samples between
each sample. How should we do this? There are two possible solutions, one is better than
the other...
hold and a lowpass lter. We can then resample the signal at higher rate. This is the
ugly approach, since we must go back into the 'analog world' and we are sure to lose
2. Digitally resample and lowpass lter : This is the pretty way of doing it. Let's inves-
tigate how...
When we sampled the signal we obtained a sequence of numbers which represent the
samples (the impulses). What happens if we resample this signal L times faster?
Assume that sampling the signal x(t) gives x[n] = {12, 9, 15} resampling this signal
L=4 times faster will give new signal y[n] = {12, 0, 0, 0, 9, 0, 0, 0, 15} (c.f. Figure 21).
If the time between the samples of x[n] is T1 then the time between the samples of y[n]
T1
is T2 = L . Hence we may write y[n] as
∞
X
y[n] = x[k]δ(n − kL)
k=−∞
= ... + x[0]δ(n − 0) + x[1]δ(n − L) + x[2]δ(n − 2L) + ...
31
Let's see what the discrete time Fourier transform of y[n] looks like:
∞
X
Ȳ (jω) = Ỹ (ejωT2 ) = y[n]e−jnωT2
n=−∞
∞
X ∞
X
= x[k]δ(n − kL)e−jnωT2
k=−∞ n=−∞
∞ ∞
" #
X X
= x[k] δ(n − kL)e−jnωT2
k=−∞ n=−∞
X∞
= x[k]e−j(kL)ωT2
k=−∞
jωT2 L
= X̃(e )
jωT1
= X̃(e )
2π
• Hence the spectrum of y[n] (which is sampled at ω2 = T2 ) is identical to that of x[n].
• Therefore there are copies of X(jω) (the spectrum of the analog signal x(t)) at ±nω1 ,
1
and scaled by
T1 .
• But if we had sampled the analog signal x(t) at ω2 rad/s we would only have copies
So, we need the remove the copies which are not centered at ±nω2 . We do this with a
digital lowpass lter. Each copy can have a footprint on the frequency axis of ± ω21 .
ω1 ω2 2π π
• So, the lter cuto frequency needs to be at ω= 2 = 2L = 2LT2 = LT2 . The digital
π
lter has a cuto at ωT2 = L.
1
The spectrum is still scaled by
T1 so if the lter has a gain of L the ltered spectrum will
1
have a scaling of
T2 as required. By ltering y[n] this lter will output an interpolation of
π
2. Lowpass lter with a cuto at ωT2 = L and a gain of L.
32
|X(jω)|
ω1 ω2
2
= 2L
π
ωT2 = L
|H(ejωT2 )|
|X̃int (ejωT2 )|
1
T2
−ω2 ω2 ω
Figure 22: A example of upsampling with L = 3. Shown is the original signal spectrum,
|X(jω)|, and the spectrum of the sampled and then upsampled signal, |Ỹ (ejωT2 )|. Also
shown is the lowpass lter used to interpolate between the original samples, |H̃(e
jωT2 )|,
π jωT
which has a cuto at ωT2 =
L . The nal upsampled spectrum is shown as |X̃int (e
2 )|.
33
2.7 Decimation (Downsampling)
Downsampling involves decreasing the sample rate by interger multiples. Assume we have
sampled a signal x(t) every T1 seconds to get x[n]. When downsampling by a factor of M
we take every M th sample:
This is equivalent to sampling the original signal every T2 = M T1 seconds (c.f. Figure 23).
This results in copies of the analog signal spectrum, X(jω), being shifted to multiples
2π ω1 1
of ω2 = T2 = M and scaled by T2 .
This is simply a re-statment of the sampling theorem. Before the signal was sampled all
ω1
frequencies greater than ω=
2 were removed using an analog anti-alias lter. Before we
ω ω1
downsample we can remove, from x[n], all remaining frequencies greater than ω = 2 =
2 2M
using a lowpass digital lter, H̃(e
jωT1 ). Hence H̃(ejωT1 ) has a cuto at ωT = π . We call
1 M
this ltered version xlpf [n]. Now, the downsampled sequence is y[n] = xlpf [M n].
In summary, to downsample x[n] by a rate M:
π
1. Low pass lter x[n] with a cuto ωT1 = M and unity gain to get xlpf [n].
2. Take every M th sample from xlpf [n] to get the downsampled signal, y[n] = xlpf [M n].
34
|X(jω)|
|X̃(ejωT1 )|
1
T1
|H̃(ejωT1 )|
M =2
1
T1
|X̃(ejωT1 )H̃(ejωT1 )|
1
T1
|Ỹ (ejωT2 )|
1
T2
−6ω2 −5ω2 −4ω2 −3ω2 −2ω2 −ω2 ω2 2ω2 3ω2 4ω2 5ω2 6ω2 ω
−3ω1 −2ω1 −ω1 ω1 2ω1 3ω1
35
3 Optimum and Adaptive ltering
In the 1940s Norbert Wiener conducted fundamental research into the following problem:
given a measured signal, x[n], which is a corrupted version of the desired signal d[n], what
linear lter, w[n], will provide the best estimate of d[n] from the measured values of x[n].
First we will deal with the FIR time-invariant Wiener lter.
d[n] x[n] ˆ
d[n]
Corruption Wiener Filter w[n]
(Noise + Distortion)
Figure 25: Wiener lter. Desired signal, d[n]. Corrupted signal, x[n]. Estimated signal,
ˆ .
d[n]
w[k], which would minimse the mean-square error between the estimate of the signal,
ˆ = Σp−1 w[k]x[n − k], and the desired signal, d[n]. Hence we dene the error, e[n], as:
d[n] k=0
ˆ
e[n] = d[n] − d[n]
p−1
X
e[n] = d[n] − w[k]x[n − k].
k=0
p−1
!2
X
= E (e[n])2 = E
d[n] − w[k]x[n − k] .
k=0
We wish to minimise this expression with respect to each of the w[i]. Therefore we dier-
entiate to get
∂ ∂
E (e[n])2
=
∂w[i] ∂w[i]
∂(·)
∂x and E {·} are both linear operators, so their order can be interchanged:
∂ ∂
(e[n])2
= E
∂w[i] ∂w[i]
∂e[n]
= E 2e[n]
∂w[i]
∂e[n]
= 2E e[n]
∂w[i]
36
Pp−1
But e[n] = d[n] − k=0 w[k]x[n − k], which when expanded is e[n] = d[n] − w[0]x[n] −
w[1]x[n − 1]..., so
∂e[n]
= −x[n − i].
∂w[i]
This gives
∂
= 2E {−e[n]x[n − i]}
∂w[i]
∂
= −2E {e[n]x[n − i]}
∂w[i]
∂
We then minimse by setting
∂w[i] equal to zero for each i = 0, ..., (p − 1):
This tells us that our error when trying to recover the signal, d[n], must be uncorrelated
with the measured signal x[n]. If our error was in some way dependent on the input to
the lter, x[n], then we would expect we could remove the dependent part using a better
p−1
( ! )
X
E d[n] − w[k]x[n − k] x[n − i] = 0, i = 0, ..., (p − 1)
k=0
p−1
( !)
X
E (d[n]x[n − i]) − x[n − i] w[k]x[n − k] = 0, i = 0, ..., (p − 1)
k=0
p−1
( )
X
E {d[n]x[n − i]} − E x[n − i] w[k]x[n − k] = 0, i = 0, ..., (p − 1)
k=0
E {d[n]x[n − i]} −
E {x[n − i] (w[0]x[n − 0] + ... + w[p − 1]x[n − p + 1])} = 0, i = 0, ..., (p − 1)
E {d[n]x[n − i]}
−w[0]E {x[n − i]x[n − 0]} + ... + w[p − 1]E {x[n − i]x[n − p + 1]} = 0, i = 0, ..., (p − 1) (4)
p−1
X
E {d[n]x[n − i]} = w[k]E {x[n − i]x[n − k]} i = 0, ..., (p − 1) (5)
k=0
37
Hence Equation 5 becomes
p−1
X
rdx [i] = w[k]rxx [k − i], i = 0, ..., (p − 1).
k=0
These p equations are called the Wiener-Hopf equations, due to their introduction by
Norbert Wiener and Eberhard Hopf whilst working at MIT. It's called a Wiener lter
because Hopf moved to Germany in 1936, where National Socialist German Workers' Party
was in power, and his much of his contribution went unacknowledged. History is written
by the victors!
rxx [0] rxx [1] rxx [2] · · · rxx [p − 1] w[0] rdx [0]
rxx [1] rxx [0] rxx [1]
w[1]
rdx [1]
rxx [2] rxx [1] rxx [0]
w[3] =
rdx [3] ,
. .. . .
. . . .
. . .
rxx [p − 1] rxx [0] w[p − 1] rdx [p − 1]
which is the matrix form of the Wiener-Hopf equations. Written more compactly in matrix
algebra we have:
Rxx w = rdx .
There is an algorithm called the Levinson-Durbin algorithm which eciently solves these
equations. We will meet this algorithm later in the Speech Processing section.
wopt = R−1
xx rdx
In order to nd these lter coecients we must estimate the autocorrelation and cross-
correlation statistics. This makes a big assumption: that the statistics are stationary! If
they change we're in trouble. That's why this is called a time-invariant lter, because the
38
3.1.1 Minimum mean squared error of Wiener lter
We can calculate the expected minimum mean square error:
This is e[n]
z }| {
p−1
!
2 X
= E e [n] = E e[n] d[n] − w[k]x[n − k]
k=0
p−1
X
= E {e[n]d[n]} − w[k] E {e[n]x[n − k]}
| {z }
k=0
This is zero
= E {e[n]d[n]} (6)
For optimal lter coecients E {e[n]x[n]} = 0, from Equation 3. Next, we substitute the
This is e[n]
z }| {
p−1
!
X
= E d[n] − w[k]x[n − k] d[n]
k=0
p−1
X
= E {d[n]d[n]} − w[k]E {d[n]x[n − k]}
k=0
p−1
X
= rdd [0] − w[k]rdx [k].
k=0
39
3.1.2 Corruption due to uncorrelated noise
When calulating the optimum lter parameters we must estimate rxx [k] and rdx [k]. If we
assume that noise, v[n], has simply been added to the original signal,
If we also assume the noise is uncorrelated with d[n] (rdx [k] = E {d[n]v[n − k]} = 0) then
Also,
40
Example: Given signal statistics design Wiener lter
Given that d[n] is known to be a process with an autocorrelation given by rdd [k] = a|k| ,
with 0 < a < 1. Additive white noise with a variance of σ2 has corrupted d[n] to give
x[n]. Design an optimum second order lter to retrieve an estimate of d[n] from x[n]. If
a = 0.8 and σ
2 = 1, determine the lter coecients. Estimate the mean square error of
the output.
Since d[n] and v[n] are uncorrelated and v[n] is white noise, we get rxx [k] = rdd [k]+rvv [k] =
a|k| + σ 2 δ[k]. Also, rdx [k] = rdd [k]. So,
Solving gives
" # " #
w[0] 1 1 + σ 2 − a2
= .
w[1] (1 + σ 2 )2 − a2 aσ 2
When a = 0.8 and σ2 = 1 we have wT = [ 0.4048 0.2381 ]. Figure 26 shows the lter
1.5
0.5
x[n]
−0.5
−1
−1.5
−2
−2.5
0 20 40 60 80 100
n
Figure 26: Wiener Filtering. Dashed: Original signal. Dotted: Signal with white noise
added. Solid: ltered signal.
41
|Wopt (ejωT )|
Noise
Signal
π ωT
Figure 27: Spectral illustration of Wiener ltering a noisy signal. The lter tries to preserve
as much signal and removes as much noise as possible.
" #" #
0
h i 0.5952 −0.2381 1
= 0.8 − 1 0.8
−0.2381 0.5952 0.8
= 0.2048
42
3.2 Adaptive ltering
So what's wrong with Wiener ltering? Two things:
2. Estimating correlation statistics takes time since we must wait for the data and then
compute an average.
optimal value for the lter coecients, wn , We can use a steepest descent algorithm to
make iterative changes until we arrive at the new optimum. This might seem pointless since
we can simply nd the new optimum in one move by solving the Wiener-Hopf equations,
but the value of this method will become clear when we try to solve problem 2 above.
p−1
!2
X
= E e2 [n] = E
d[n] − wn [k]x[n − k]
k=0
is a quadratic function in wn [k]. The error surface traced out by varying each wn [k] is a p
dimensional quadratic 'bowl' which has only one minimum. Figure 28 illustrates this for
We can move towards that mimium by stepping a small distance, µ, in a direction down
the surface to get wn+1 [k]. The direction we wish to move is the opposite of the steepest
direction up the slope (grad() = ∇). Hence our updated coecients are
wn+1 = wn − µ (∇)
⇓
∂
wn+1 [0] wn [0] ∂wn [0]
∂
wn+1 [1] wn [1]
=
− µ ∂wn [1]
. . .
. .
.
.
.
.
wn+1 [p − 1] wn [p − 1] ∂
∂wn [p−1]
∂
So we need to nd the derivatives
∂wn [i] . (We did this earlier, but here it is again.)
∂ ∂
E (e[n])2
=
∂wn [i] ∂wn [i]
43
w[1]
w[0]
w[1]
wn
wn+1
w[0]
44
∂(·)
∂x and E {·} are both linear operators, so their order can be interchanged:
∂ ∂
(e[n])2
= E
∂wn [i] ∂wn [i]
∂e[n]
= E 2e[n]
∂wn [i]
∂e[n]
= 2E e[n]
∂wn [i]
Pp−1
But e[n] = d[n] − k=0 wn [k]x[n − k], which when expanded is e[n] = d[n] − wn [0]x[n] −
wn [1]x[n − 1] − ... − wn [p − 1]x[n − p + 1], so
∂e[n]
= −x[n − i].
∂wn [i]
This gives
∂
= 2E {−e[n]x[n − i]}
∂wn [i]
∂
= −2E {e[n]x[n − i]}
∂wn [i]
d[n]
x[n] ˆ
d[n]
Filter w[n]
∂
µ ∂w[k]
Adaptive e[n]
Algorithm
If the signal statistics remain constant, this will converge to the optimum lter. If the
statistics change, it will try to follow them. However problem 2 still remains: we must
45
3.2.2 The LMS algorithm
The required expectation in Equation 7 may be estimated using the sample mean,
L−1
1 X
E {e[n]x[n − k]} = e[n − l]x[n − k − l].
L
l=0
wn+1 [0] wn [0] e[n]x[n]
wn+1 [1] wn [1] e[n]x[n − 1]
.
= .
+ µ .
.
. . .
. . .
wn+1 [p − 1] wn [p − 1] e[n]x[n − p + 1]
• This is hardware ecient since each update requires approximately one vector mul-
• This comes at the price of having slower convergence properties than the Steepest
Descent algorithm.
• It will sometimes move up the error surface, since we are using crude approximation
One known technical limitation on this method is that the step size, µ, must be kept small
2
if the algorithm is to converge. In fact it must be smaller than
λmax , where λmax is the
2
µ<
λmax
d[n], is sent down the channel. The receiver measures x[n] which is a corrupted
version of d[n]. The adaptive lter then 'equalises' the eects of the channel. Once
the training signal has nished the lter taps are frozen an the data is transmitted.
after some data has been transmitted, to adjust to the changing channel.
46
3.3 Adaptive system identication
Equalisation essentially looks for the inverse of the eect which corrupted the original
signal. Another interesting area where adaptive ltering can be applied is in system iden-
ˆ
d[n]
System Model w[n]
We put the same input into the lter and the system and adjust the lter until both
outputs are the same, or close. Conceptually the setup is dierent, but the mathematics
ˆ
d[n]
System Model w[n]
∂
µ ∂w[i]
Adaptive algorithm
Figure 32 illustrates the convergence properties of the LMS algorithm applied to system
identication. The system impulse response is h[n] = [ 1 0.2 ]. The input to both the
47
system and the lter, x[n] is coloured noise, with an autocorrelation of rxx [k] = 0.8|k| . We
see that after about 1000 samples the system has been adequately modelled.
0.4
1
0.3
0.2 0.8
e[n]=d[n]−dest[n]
Coefficients
0.1
0 0.6
−0.1
0.4
−0.2
−0.3
0.2
−0.4
−0.5 0
0 500 1000 1500 2000 0 500 1000 1500 2000
n n
Figure 32: Adaptive system identication example. Coecients converge to the known
system coecients.
48
4 Spectral analysis
Spectral analysis deals with the examination of the frequency content of random signals.
Since we do not have an explicit expression for the signals in question we cannot directly
calculate their spectra. However using the statistics of the signals we can estimate on
average what frequenies contribute to the signal power to what amount.
Since we are dealing with stochastic processes we rst need to quickly revise some basics
• We can dene the probability that X will be below, or equal to, some value x as
FX (x) = P (X ≤ x).
This is called the cumalitive distribution function (CDF). This has some obvious
properties:
dFX (x)
fX (x) = .
dx
P (x1 ≤ X ≤ x2 ) = P (X ≤ x2 ) − P (X ≤ x1 )
= FX (x2 ) − FX (x1 )
Z x2
= fX (x)dx.
x1
R∞
Hence,
−∞ fX (x)dx = 1. Figure 33 shows the CDF and PDF for a Gaussian random
variable:
−(x−µ)
1 2σ 2
fX (x) = √ e .
σ 2π
49
1
0.8
0.6
0.4
0.2
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
0.4
0.3
0.2
0.1
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
Figure 33: CDF and PDF for a Gaussian random variable with zero mean and variance
σ 2 = 1.
• The expected value of a function of a random variable, g(X), is the average over a
large (innite) number of experiments. Since the event X=x occurs with a relative
Z ∞
E {g(X)} = g(x)fX (x)dx
−∞
• The nth moment of a random variable is dened as the expected value of g(x) = xn :
Z ∞
n
E {X } = xn fX (x)dx.
−∞
elapses. We call this output a random process, X(t), rather than a random variable.
The CDF and PDF for a random process are dened as for a random variable, except
FX (x, t) = P (X(t) ≤ x)
dFX (x, t)
fX (x, t) = .
dx
• If we x t = t1 , then X(t1 ) is simply a random variable and all the rules for random
variables apply. If we were to estimate the CDF, FX (x, t1 ) and PDF, fX (x, t1 ) we
would need to set up a large (preferably innite) number of experiments, all running
simultaneously and inspect them when t = t1 as shown in Figure 34. This large
50
group of experiments is called an ensemble. Once we have estimated fX (x, t1 ) we
Z ∞
E {X(t1 )} = xfX (x, t1 )dx
−∞
X1 (t)
X2 (t)
X∞ (t)
Figure 34: An ensemble of random processes, X(t), which we would use to estimate
fX (x, t1 ).
4.1.3 Stationarity
• A process is said to strictly stationary if all its statistics are independent of time. If
• A less strict version of stationarity often used is wide sense stationary. In this case
Since the autocorrelation is stationary for a wide sense stationary process, the fol-
51
lowing is true:
Hence, rXX (t1 , t1 + τ ) is only a function of the time dierence, τ. Hence, for a wide
4.1.4 Ergodicity
• To estimate the expected value (the mean) of a random process X(t) at time t = t1 ,
we have had to examine a large number of concurrent experiments and nd the
Z t1 +T
1
E {X(t1 )} = lim X(t)dt,
T →∞ 2T t1 −T
then the process is said to be ergodic in the mean. If the same is true for the
autocorrelation
Z t1 +T
1
E {X(t1 )X(t1 + τ )} = lim X (t) X (t + τ ) dt
T →∞ 2T t1 −T
ergodic.
• To be ergodic the process must be stationary. But, a process can be stationary but
52
4.2 Continuous-time power spectral density
The power spectral density (PSD) and the autocorrelation function of a signal, say x(t),
form a Fourier transform pair:
Z∞
Sxx (ω) = rxx (τ )e−jωτ dτ,
−∞
Z∞
1
rxx (τ ) = Sxx (ω)ejωτ dω.
2π
∞
We do not prove this here. But this fact is called the Wiener-Kinchin theorem and we use
it as a starting point.
• We can see why this might make sense by examining the instantaneous power of the
signal:
Z∞
2
1
E x (t) = rxx (0) = Sxx (ω)dω.
2π
−∞
This is the integral of the power spectral density at all frequencies. This is why the
2
units of the PSD is Watts/Hz or V /Hz. It can be shown that if x(t) is real then
53
4.3 Discrete-time power spectral density
We assume we are examining the process x[n]. The autocorrelation becomes rxx [k] =
E {x[n]x[n + k]}, where k is an integer. Since we have samples of r[k] can write PSD as
∞
X
S̃xx (e jωT
)= rxx [k]e−jωkT .
k=−∞
2π
If the signal x[n] was sampled at rate, ωs = T , (every T seconds) this gives
∞
jωT 1 X
Sxx (e ) =
e Sxx (ω − ωs )
T
k=−∞
If ωs is greater than twice the maximum frequency in x(t) (bandlimited) then we can write
1
Sexx (ejωT ) = Sxx (ω) for |ωT | < π.
T
Sxx (ω)
S̃xx (ejωT )
1
T
54
4.4 The periodogram
Ultimately we will have to estimate the PSD from the data. If we have a signal x[n] and
we have collected N samples, we can model this by multiplying x[n] by a window function
w[n] to get
v[n] = w[n]x[n],
where the window function has the property of being a square pulse:
w[n] = 1 if 0 ≤ n ≤ N − 1,
= 0 otherwise.
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
−6 −4 −2 0 2 4 6 8 10 12 14
N −1
1 X
rvv [k] ≈ v[n]v[n + k]
N
n=0
N −1
1 X
= (w[n]x[n]) (w[n + k]x[n + k]) .
N
n=0
55
The estimated PSD of the windowed process is therefore
∞
X
Sevv (ejωT ) = rvv [k]e−jωkT
k=−∞
∞ N −1
X 1 X
≈ v[n]v[n + k]e−jωkT
N
k=−∞ n=0
∞ N −1
1 X X
≈ v[n]ejωnT v[n + k]e−jω(n+k)T
N
k=−∞ n=0
"N −1 #" ∞
#
1 X X
≈ v[n]ejωnT v[n + k]e−jω(n+k)T
N
n=0 k=−∞
"N −1 # "N −1 #
1 X jωnT
X
−jωmT
≈ v[n]e v[m]e
N
n=0 m=0
Ṽ ∗ (ejωT )Ṽ (ejωT )
≈
N
|Ṽ (ejωT )|2
≈ . (8)
N
We forgot to normalise for the energy in the window so we add this normalising factor:
PN −1
U= n=0 (w[n])2 :
|Ṽ (ejωT )|2
Sevv (ejωT ) ≈ . (9)
NU
Which for the rectangular window is U = N.
This is the DFT (discrete Fourier transform since we only used N samples) magnitude
squared divided by the number of samples, N, and the energy in the window, U. This is
Remember, we cannot practically know what's going on at all frequncies, so the DFT
2πm
is usually evaluated at ωT = N this give the machine-calculable DFT:
N −1
X 2πm
V̄ [m] = Ṽ (e jωT
) = v[n]e−jn
N
ωT = 2πm
N n=0
56
Example
We can estimate the power spectral density of a sinusoid, x[n] = sin(ω0 nT ). Let's choose
π
ω0 T = 2 . So, x[n] = sin( nπ
2 ). We'll take N =8 samples of the signal. Figure 37 show the
0.5
−0.5
−1
1 2 3 4 5 6 7 8
n
0.25
0.2
0.15
0.1
0.05
0
1 2 3 4 5 6 7 8 9
m
0.25
0.2
0.15
0.1
0.05
0
0 5 10 15 20 25 30 35
m
too.
We can also take more samples and see what happens. If we take N = 64 samples and
57
1
0.5
−0.5
−1
0 10 20 30 40 50 60 70
n
0.25
0.2
0.15
0.1
0.05
0 10 20 30 40 50 60
m
0.25
0.2
0.15
0.1
0.05
0
0 50 100 150 200 250
m
58
4.5 Window functions
We can investigate what eect using a window function had on our periodogram estimator.
n o
We want to nd the bias in our estimation. Hence we are looking for E S̃vv (ejωT ) as
∞
( )
n o X
−jωkT
E S̃vv (ejωT ) = E rvv [k]e
k=−∞
= ... + E {rvv [−1]} e−jω(−1)T + E {rvv [0]} e−jω(0)T + E {rvv [1]} e−jω(1)T + ...
X∞
= E {rvv [k]} e−jωkT
k=−∞
∞ N −1
( )
X 1 X
= E w[n]x[n]w[n + k]x[n + k] e−jωkT
N
k=−∞ n=0
∞ N −1
" #
X 1 X
= w[n]w[n + k] E {x[n]x[n + k]} e−jωkT
N
k=−∞ n=0
X∞
= rww [k]rxx [k]e−jωkT
k=−∞
So the expected value (the bias) of the periodogram estimator is the DTFT of the product
of the autocorrelation of the signal and the autocorrelation of the window. But, the discrete
π
ZT
n
jωT
o 1
E S̃vv (e ) = S̃ww (ej(ω−θ)T )S̃xx (ejθT )dθ.
2π
θ=− Tπ
• So the window, w[n], has the eect of smearing the frequency content of S̃xx (ejωT )
across the frequency spectrum. This is called spectral leakage.
• We can tailor the window shape to try and reduce this eect.
PN −1
• Don't forget to normalise for the window energy: U= n=0 (w[n])2
π
ZT
n o 1
E S̃vv (ejωT ) = S̃ww (ej(ω−θ)T )S̃xx (ejθT )dθ.
2πU
θ=− Tπ
59
The rectangular window which we have been using up until now has very abrupt transitions
in the time domain. So, its frequencies spectrum is very broad. We take the DTFT to see
this:
∞
X
jωT
W̃ (e ) = w[n]e−jωnT
n=−∞
N −1
X n
= e−jωT
n=0
1 − e−jωT N
= (by the geometric series)
1 − e−jωT
jωN T jωN T jωN T
e− 2 e 2 − e− 2
= jωT
jωT jωT
e− 2 e 2 − e− 2
jω(N −1)T sin(N ωT
2 )
= e− 2 .
sin( ωT
2 )
sin(N ωT
2 )
|W̃ (ejωT )| = .
sin( ωT
2 )
But we can show (similar to an earlier calculation in Equation 8) that S̃ww (ejωT ) is
sin(N x)
• As ω → 0, |S̃ww (ejωT )| → N , since limx→0 sin(x) = N.
N ωT
= ±kπ
2
m
2π
ωT = ±k
N
60
Figure 39: Spectra of a selection of window functions. The importance of resolution versus
spectral leakage trade-o becomes apparent.
61
• Hence the width of the main lobe is
4π
∆ωT = .
N
• This is the approximate frequency resolution of the periodogram when using a rect-
angular window.
• If two frequencies are closer than this, the main lobes in the periodogram will overlap
increasing N the spectrum becomes more like a delta function. The resolution in
2
∆f = .
NT
• jωT
n N →o∞, S̃ww (e ) → 2πU δ(ω),
Taken to its extreme, as which gives the following
π
ZT
n o 1
E S̃(ejωT ) = 2πU δ(ω − θ)S̃xx (ejθT )dθ = S̃xx (ejωT ).
2πU
− Tπ
So the estimator is unbiased in the limit as N →∞ (on average it gives the correct
answer).
62
The most widely-used window function to reduce spectral smearing, or leakage, is probably
n
whamming [n] = 0.538 − 0.462 cos 2π for 0 ≤ n ≤ N − 1.
N −1
The tails of the spectrum are reduced, causing more contained spectral leakage but at
the cost of a loss in resolution since the main lobe is widened. There are other windows,
n n
wblackman [n] = 0.42 − 0.5 cos 2π − 0.08 cos 4π
N −1 N −1
n n
wf lattop [n] = 1 − 1.93 cos 2π + 1.29 cos 4π
N −1 N −1
n n
−0.388 cos 6π + 0.032 cos 8π .
N −1 N −1
These windows have lower spectral tails, but broader main lobes.
• If enough data is available we can use a window to reduce the leakage and make N
large enough to get the resolution we want.
1 1 1.2
1
0.8 0.8
0.8
0.6 0.6 0.6
0.2
0.2 0.2
0
0 0 −0.2
0 50 100 150 200 250 300 0 50 100 150 200 250 300 0 50 100 150 200 250 300
Figure 40: Hamming window, Blackman window and Flat-top window. N = 256.
63
4.6 The averaged periodogram
While we see that the periodogram is unbiased (on average it will give the right answer),
it can be shown (not shown here though) that as N →∞ the variance does not tend to
zero! We say it is not a consistant estimate. In fact, it can be shown that the variance of
2
var Svv (ejωT ) ≈ Sxx (ejωT ) .
(The reason for this is embedded in the fact that: while, as N → ∞, rvv [k] =
1 PN −1
N n=0 (w[n]x[n]) (w[n + k]x[n + k]) → rxx [k], it does not do so uniformly. Choose any
value of and it is impossible to choose a value of N such that |rvv [k] − rxx [k]| < for all
k .)
• Weird! So, what am I talking about? Well, take white Gaussian noise for instance.
∞
X
S̃xx (e jωT
) = rxx [k]e−jωkT
k=−∞
2 −0
= σ e
= σ2
Hence it is has a at power spectrum. Figure 41 shows two estimates of the power
4000 4000
3000 3000
2000 2000
1000 1000
0 0
−1000 −1000
−2000 −2000
−3000 −3000
0 100 200 300 400 500 600 0 100 200 300 400 500 600
n n
5000 7000
4500
6000
4000
3500 5000
3000
4000
2500
3000
2000
1500 2000
1000
1000
500
0 0
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
m m
Figure 41: Power spectrum estimates of white gaussian noise, estimated using a peri-
odogram with N = 512. The spectrum is not as 'at' as expected.
64
• Everytime we estimate the spectrum we usually get the wrong answer, and the vari-
• And does not get any better when we increase N as we see from Figure 42, where
N = 2048.
4000
3000
2000
1000
−1000
−2000
−3000
−4000
0 500 1000 1500 2000 2500
n
2000
1800
1600
1400
1200
1000
800
600
400
200
0
0 20 40 60 80 100 120 140
m
Figure 42: PSD estimated from 2048 samples of a Gaussian process. Variance is about the
same as when N = 512.
• The data is rst broken into segments. Suppose we have N = KL data points. We
• The periodogram of the k th segment is (including the normalisation for the window
N −1 2
energy, U n=0 (w[n]) ):
65
• Our nal estimate is the average of all K periodograms:
K−1
jωT 1 X k jωT
S̃vv (e )= S̃vv (e ).
K
k=0
4 4 4 4
2 2 2 2
Data 0 0 0 0
segments −2 −2 −2 −2
−4 −4 −4 −4
0 50 100 0 50 100 0 50 100 0 50 100
time time time time
4 4 4 2
2 2 1
2
Windowed 0 0 0
Data 0
−2 −2 −1
−4 −4 −2 −2
0 50 100 0 50 100 0 50 100 0 50 100
time time time time
0 0 0 0
10 10 10 10
Periodograms
−5 −5 −5
10 10 10
10
−2 Averaged
Periodogram
−4
10
0 0.5 1 1.5 2
ω 4
x 10
• Since at any frequency the estimate is the average of K i.i.d. variables (independent
and identically distributed variables), this will result in a reduction in the variance
by a factor of K:
h i S̃ 2 (ejωT )
var S̃vv (ejωT ) ≈ xx .
K
So the more segments we have the lower the variance of our estimate.
66
Example: Spectrum analyser
You have been asked to design a spectrum analyser to be used in an electronic tuning
device which will sample a band-limited sound signal at a rate of 8 kHz. It must have a
frequency resolution of 5Hz, and the standard deviation of the estimate must be within
• To get the standard deviation down to 20% the true value we need
S̃xx (ejωT )
r h i
var S̃vv (ejωT ) ≈ √ ≤ 0.2S̃xx (ejωT )
K
√
K ≥ 5
K ≥ 25
2
∆f = ,
LT
4(8000)
5 =
L
L = 6400
6399
X 2πm
V̄ k [m] = Ṽ k (ejωT ) = vk [n]e−jn
N
ωT = 2πm
N n=0
67
4.7 Short-time Fourier transform
Suppose we have a signal which is nonstationary. Take for example the signals in Figure 44.
These are obviously two dierent signals, but their power spectral density estimates are the
same. What would be more useful would be some sort of time-frequency representation.
We want something like what is shown in Figure 45. Here we can see how an estimate of
1 0.05
0.8 0.045
0.6 0.04
0.4 0.035
0.2 0.03
0 0.025
−0.2 0.02
−0.4 0.015
−0.6 0.01
−0.8 0.005
−1 0
0 20 40 60 80 100 120 140 0 500 1000 1500 2000 2500
n m
1 0.05
0.8 0.045
0.6 0.04
0.4 0.035
0.2 0.03
0 0.025
−0.2 0.02
−0.4 0.015
−0.6 0.01
−0.8 0.005
−1 0
0 20 40 60 80 100 120 140 0 500 1000 1500 2000 2500
n m
Figure 44: A single periodogram estimator of the power spectral density of a nonstationary
π π
signal. The two tones in the signal have frequencies at ω1 T = 4 and ω2 T = 8.
We can use the short-time Fourier transform to derive a plot similar to that of Figure 45...
68
1
3
0.8
0.6 2.5
0.4
2
0.2
ωT
0
1.5
−0.2
−0.4 1
−0.6
0.5
−0.8
−1 0
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
n n
1
3
0.8
0.6 2.5
0.4
2
0.2
ωT
0
1.5
−0.2
−0.4 1
−0.6
0.5
−0.8
−1 0
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
n n
Z∞
ST F T (t0 , ω) = (g(t − t0 )f (t)) e−jωt dt
t=−∞
were g(t − t0 ) is the window function, g(t), which is shifted to be centred at t = t0 . g(t) is
a short-time window function which quickly decays to zero. A typical function would be:
2
g(t) = e−αt for α>0
• ST F T (t0 , ω) is simply the Fourier transform of the signal g(t − t0 )f (t). Figure 46
• A short section of the signal f (t) is isolated by g(t − t0 ) about the time t = t0 .
• The frequency content of this isolated section is determined using the single peri-
odogram estimator.
69
1
0.5
f(t)
0
−0.5
−1
0 200 400 600 800 1000 1200
t
0.8
g(t−t )
0.6
0
0.4
0.2
0
0 200 400 600 800 1000 1200
t0 t
0.5
f(t)g(t−t0)
−0.5
−1
0 200 400 600 800 1000 1200
t
Figure 46: An illustration of how the function g(t − t0 ) picks out the section of the chirp
signal f (t) at time t0 =500.
• If we calculate the STFT at a number of dierent times we can construct the time-
Figure 47 shows a sketch of what the spectrogram might look like for the `chirp' signal
K
∆t =
∆f
⇓
∆t∆f = K
70
ω
t = 500
Figure 47: Sketch of what the spectrogram of the chirp signal in Figure 46 might look like.
• If we want ne-grain time resolution we take small windows which will give a poor
frequency resolution.
71
4.7.3 Discrete STFT
The STFT for discrete signals is:
∞
X
ST F T [n0 , ω) = g[n − n0 ]f [n]e−jωnT
n=−∞
2πm
Of course we can evaluate this at any frequency we like. We usually set ωT = N and
vary m over over 0, ..., N − 1 to give:
∞
X 2πm
ST F T [n0 , m] = g[n − n0 ]f [n]e−j N
n
n=−∞
n0 +(N −1)
X
ST F T [n0 , ω) = g[n − n0 ]f [n]e−jωnT .
n=n0
N
X −1
ST F T [n0 , ω) = g[k]f [k + n0 ]e−jω(k+n0 )T
k=0
N −1
!
X
= ejn0 ωT g[k]f [k + n0 ]e−jωkT (10)
k=0
• The term ejn0 ωT is just a phase component so we can ignore it and examine the
magnitude.
N
X −1
|ST F T [n0 , ω0 )| = g∗ [k, ω0 )f [k + n0 ]
k=0
i.e. the Fourier transform of the window convolved with a delta function at ω0 .
72
get
sin(N ωT
2 )
|F {g[k]}| = |G̃(ejωT )| =
sin( ωT
2 )
sin(N (ω−ω0 )T )
G̃∗ (ejωT , ω0 ) = 2
sin( (ω−ω
2
0 )T
)
This looks like a narrow band-pass lter at ω = ω0 . So...
|F {ST F T [n, ω0 )}| = G̃∗ (ejωT , ω0 ) |F̃ (ejωT )|
!
sin(N (ω−ω 0 )T
)
= 2
(ω−ω0 )T
|F̃ (ejωT )|
sin( 2 )
2πm
• If we evaluate the STFT at ωT = N , for m = 0, ..., N − 1 the STFT will behave
like a bank of N narrowband lters (c.f. Figure 48), each one allowing a dierent
frequency to pass.
G∗(ejωT )
ω
|ST F T [n, ωN −1)|
ωN −1
• This is a very useful transform for analysing the structure of transient signals, like
speech. Figure 49 shows a speech waveform, its PSD and its STFT.
73
Speech
0.5
0.4 o
"ong"
S
0.3 l
y[n] 0.2
0.1
−0.1
−0.2
0 0.2 0.4 0.6 0.8 1
TIME
−2
10
S(f)
−3
10
−4
10
−5
10
0 2000 4000 6000 8000 10000
FREQUENCY (Hz)
6000
5000
FREQUENCY (Hz)
4000
3000
2000
1000
0
0 0.2 0.4 0.6 0.8 1
TIME (s)
Figure 49: (top) Speech waveform for the words So long. (middle) An estimate of the
PSD using the averaged periodogram estimator. (bottom) The spectrogram derived using
the short-time Fouier transform. The variation of frequency with time becomes apparant.
74
5 Speech processing
• The processing of digitised speech signals is an important application of DSP.
ne).
using recorded speech snippets, for reading text, for altering speech characteristics
There are three possible ways you can make a phoneme sound:
1. The vocal chords You tighten your vocal chords until they seal shut. You then
use your lungs to force them open. When they open a small noise will be made for a
short time, due to the pressure dierence. If is were possible to remove the head (!)
and listen to the sound of the vocal chords vibrating it would sound like an impulse
vocal chords. Sounds formed in this way are called voiced sounds. (/a/, /e/, /oo/,
etc.)
2. The vocal tract When the vocal chords are left relaxed and air is forced through
the vocal tract, if the vocal tract is made narrow enough (using the tongue) a hissing
sound will be created. Sounds formed in this way are called unvoiced or fricative
sounds. (/ss/, /sh/, /ch/)
75
3. Plosives By sealing the airway shut, building pressure behind the blockage and
then opening the airway we produce sounds called plosives. (/t/, /p/, /k/) However,
as a fraction of the total speech duration plosives make up very little time.
• Therefore we can model our speech process as a system excited by either a noise
• The dierence between various voiced or unvoiced sounds, is due to the variable
shape of the vocal tract. We model the vocal tract as a time-varying lter.
76
5.2 Voiced speech
5.2.1 The impulse train
So we know that voiced speech is formed by the ltering (by the vocal tract) of an impulse
train of sound, which has a period of T0 . Before we look at the ltering caused by the
vocal tract let's examine what the impulse train looks like.
∞
X
δT0 (t) = cn ejnω0 t ,
n=−∞
where,
Z π
ω0 ω0 T0 π
cn = δT0 (t)e−j 0 t dt, N ote : = .
2π −π 2 ω0
ω0
1
=
T0
Therefore,
∞
1 X jnω0 t
δT0 (t) = e .
T0 n=−∞
Z∞ X
∞
1
F {δT0 (t)} = e−j(ω−nω0 )t dt
T0
−∞ n=−∞
∞ Z∞
1 X
= e−j(ω−nω0 )t dt
T0 n=−∞
−∞
∞
2π X
= δ(j(ω − nω0 )).
T0 n=−∞
Hence, this gives an impulse train in the frequency domain (an equal contribution from all
multiples of the fundamental frequency). However, we rarely ever have a perfect impulse
train in the time domain because the vocal chords can only open at a nite speed. So what
we get is more like a ltered impulse train (c.f. Figure 51) in the frequency domain.
Since we will be creating a lter to model the eect of the vocal tract we can simply
assume that the ltering eect created by having an imperfect impulse train is included in
that lter.
77
δT0 (t)
t ω0 ω
t ω0 ω
Figure 51: Perfect and imperfect impulse train for voiced speech.
the vocal tract has on the impulse train. The sound echoes around the vocal tract and is
eected by the shape of the throat, nose, mouth, position of the tongue and lips and even
Figure 52 shows segments of voiced and unvoiced speech. We can see the impulse train
in the frequency domain and how it is shaped (ltered) by the vocal tract. The reason the
impulses in the frequency domain are not quite impulses is due to spectral smearing. (The
• Notice (by looking at the envelope of the spectrum) how the vocal tract lter transfer
Figure 53 shows the spectrogram of a speech segment. In the voiced sections, the frequency
impulses (vertical spaced dark patches) are clearly visible. In the unvoiced segments there
are no impulses and the high frequency regions contains more power than the low frequency
regions.
78
0.15
0.1
0.05
0
Amplitude
−0.05
−0.1
−0.15
−0.2
0 500 1000 1500 2000 2500 3000
n
−40
Formants
−50
−60
Power/frequency (dB/Hz)
−70
−80
−90
−100
0 2 4 6 8 10
Frequency (kHz)
Figure 52: Example of voiced speech segment (left) for the phoneme /o/ and the averaged
periodogram estimate of its power spectrum (right). The pitch harmonics are visible. We
also see how the vocal tract has ltered the harmonics.
79
Figure 53: Steve jobs says 'Hi'. Spectrogram of a speech segment. The pitch harmonics
are clearly visible. The formant structure is somewhat visible.
80
5.3 Linear predictive speech coding
• We have just observed how the spectrum of a phoneme is composed of an impulse
• But we also saw how the spectrum of the excitation signal, ε[n], which should be
`at' for an impulse train or for white Gaussian noise is shaped by the vocal tract.
• The peaks represent resonances in the vocal tract. These are preferred frequencies
• We have seen from the properties and design of lters section that we can create a
lter whose transfer function has a number of peaks and valleys by placing poles near
the unit circle. The nearer the pole is to the unit circle the larger the peak.
G
H̃(z) = Pp −k
,
1− k=1 b[k]z
where G is the gain (just using poles will make the gain everywhere greater than unity, so
G<1 to compensate). If the samples of the recorded speech are written as y[n] and we
call the impulse train or the noise excitation ε[n], then they are related by:
Ỹ (z) = H̃(z)ε̃(z)
Gε̃(z)
Ỹ (z) = Pp
1 − k=1 b[k]z −k
p
!
X
Ỹ (z) 1 − b[k]z −k = Gε̃(z)
k=1
p
X
Ỹ (z) = Gε̃(z) + b[k]Ỹ (z)z −k
k=1
p
X
y[n] = b[k]y[n − k] + Gε[n].
k=1
• We see that the current speech sample, y[n], is very much dependent on previous
samples, y[n − k] this veries our belief that speech is quite a redundant signal.
The number of previous samples needed depends on the order, p, of the lter. Usually
• If we knew the impulse train period, or the noise source power, and if we also knew
the lter coecients, b[k], we could predict what the next speech sample would be.
• This type of encoding is called linear prediction, since we predict the current sample
using a linear combination of the previous outputs.
81
• The hope is that it will take less bits to encode the pitch period, or noise source
power, and the lter coecients for a short section of speech than it would to encode
• In addition the lter parameters give us some way to quantify the dierence between
samples, say 50 ms. Sampling at 8000 Hz will give 400 samples in this section of speech.
We want to nd the parameters b[k] which will best model this section of speech. We write
our prediction for the next speech sample, ŷ[n], according to linear prediction model as
p
X
ŷ[n] = b[k]y[n − k] + Gε[n].
k=1
But we've recorded the speech and we know what that sample actually is (y[n]). So we
We want to nd the b[k] which minimises the expected mean square error,
!2
p
X
E e2 [n] = E
y[n] − b[k]y[n − k] − Gε[n] .
k=1
82
We set the derivative equal to zero to minimise for i = 1, .., p:
E {y[n − i]e[n]} = 0
p
!
X
E y[n − i] y[n] − b[k]y[n − k] − Gε[n] = 0
k=1
| {z }
This is y[n]−ŷ[n]
p
( )
X
E y[n − i]y[n] − b[k]y[n − i]y[n − k] − Gy[n − i]ε[n] = 0
k=1
p
X
E {y[n − i]y[n]} − b[k]E {y[n − i]y[n − k]} − GE {y[n − i]ε[n]} = 0
k=1
These expectations will be estimated using the 400 samples from the section we're ex-
amining. For voiced sounds, the cross-correlation between the recorded speech and the
excitation source should be zero for any lag, i.e. i > 0, since the impulse train is zero
everywhere except at the pulses, and the speech signal can't have a DC value. Hence we
p
X
E {y[n − i]y[n]} = b[k]E {y[n − i]y[n − k]} for i = 1, ..., p
k=1
p
X
ryy [i] = b[k]ryy [i − k] for i = 1, ..., p
k=1
··· ryy [p − 1]
ryy [0] ryy [1] ryy [2] b[1] ryy [1]
..
ryy [1] ryy [0] ryy [1] . ryy [p − 2]
b[2]
ryy [2]
..
ryy [2] ryy [1] ryy [0] . ryy [p − 3]
b[3]
=
r [3]
yy
. .
. .. .. .. . . .
. . . .
. . . . .
ryy [p − 1] ryy [p − 2] ryy [p − 3] · · · ryy [0] b[p] ryy [p]
Ryy b = ryy
b = R−1
yy ryy .
There is an ecient algorithm for solving matrix problems with the above form. It's called
• Figure 54 shows a speech signal, the predicted signal and the error between them for
83
4
−1
−2
−3
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
time (s)
2.5
1.5
0.5
−0.5
−1
−1.5
−2
0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02
time (s)
84
• Figure 55 shows the magnitude and phase of the transfer function H̃(z) for the same
short section of speech and the resulting periodogram when applied to an articial
impulse train.
40
35
30
25
Magnitude (dB)
20
15
10
−5
−10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency (´π rad/sample)
50
0
Phase (degrees)
−50
−100
−150
−200
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency (´π rad/sample)
−40
Formants
−50
−60
Power/frequency (dB/Hz)
−70
−80
−90
−100
0 2 4 6 8 10
Frequency (kHz)
Figure 55: Transfer function of the lter dened by the linear prediction coecients for
the phoneme /o/. Also shown is the periodogram of the section of speech from which the
coecients were derived.
• The pitch of this speech segment is about 100 Hz. We can synthesise the speech by
passing an impulse train (with 10 ms between deltas functions) through the lter.
85
• We must adjust the gain, G, appropriately.
• Figure 56 shows the result of passing an impulse train through the lter.
1.5
0.5
−0.5
−1
0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065
time (s)
Power Spectral Density Estimate via Welch Power Spectral Density Estimate via Welch
−20 −20
−25
−30
−30
−40
−35
−50
−40
Power/frequency (dB/Hz)
Power/frequency (dB/Hz)
−60
−45
−70
−50
−80
−55
−90
−60
−100
−65
−110 −70
0 2 4 6 8 10 0 2 4 6 8 10
Frequency (kHz) Frequency (kHz)
Figure 56: (Top) Example of speech synthesised using all-pole lter model (p = 22) derived
from the actual speech phoneme /o/. Solid is the impulse train. Dashed is the synthesised
speech waveform. (Bottom left) The periodogram of the synthesised speech. (Bottom
right) Same periodogram with noise added.
86
5.4 The Levinson-Durbin algorithm
The Levinson-Durbin recursive algorithm is method for solving the set of linear equations
Tx = y
where T is a Toeplitz matrix (the diagonals which run from upper left to lower right
contain the same entry) with a non-zero main diagonal. A special simplied case is when
T is symmetric. This is what we use here for the equation Ryy b = ryy :
2. i = i + 1.
Pi−1
(ryy [i]− j=1 (b(i−1) [j])(ryy [i−j]))
3. ki = Ei−1
4. b(i) [i] = ki .
6. Ei = (1 − ki2 )Ei−1
If the autocorrelation function of a speech signal y[n] is approximated by ryy [k] = ρ|k| ,
where ρ < 1, nd the coecients , b[k], which will minimise the expected squared error,
ŷ[n])2 .
E (y[n] −
ryy [0] ryy [1] ryy [2] b[1] ryy [1]
ryy [1] ryy [0] ryy [1] b[2] = ryy [2]
ρ2 ρ 1 b[3] ρ3
87
If we bash out the inverse we will get:
b[1] 1 −ρ 0 ρ
1
b[2] = −ρ 1 + ρ2 −ρ ρ2
1 − ρ2
b[3] 0 −ρ 1 ρ3
ρ
= 0 .
But, for a large matrix this is a little more time consuming than it needs to be. We use
Iteration 0:
1. E0 = ryy [0] = 1. i = 0.
2. i = i + 1 = 1.
Iteration 1: P j=0
ryy [1]− j=1 (b(0) [j])(ryy [1−j]) ryy [1] ρ
3. k1 = E0 = E0 = 1 = ρ.
4.
b(1) [1] = k1 = ρ
7. i < 3, go to step 2.
2. i = i + 1 = 2.
Iteration 2:
ryy [2]−b(1) [1]ryy [1] ρ2 −ρρ
3. k2 = E1 = 1−ρ2
= 0.
4.
b(2) [2] = k2 = 0.
5.
2. i = i + 1 = 3.
Iteration 3:
ryy [3]−b(2) [1]ryy [2]−b(2) [2]ryy [1] ρ3 −ρρ2 −0.ρ
3. k3 = E2 = 1−ρ2
= 0.
4.
b(3) [3] = k3 = 0.
88
5.
b[1] b(3) [1] ρ
b[2] = b(3) [2] = 0 .
and combining the information they provide in an optimal way (Usually using a pattern
• Number of zero-crossings,
• Autocorrelation,
Figure 57 shows how we might use two features (energy and zero-crossings) to make the
• Pattern classiers, such as neural networks, for example, are used to divide up the
• You could add any number of clever measures to aid the voiced/unvoiced decision
89
1
0.8
0.6
0.4
0.2
−0.2
−0.4
−0.6
−0.8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
time (s)
300
250
Unvoiced
200
zero−crossings
150 "Silence"
Voiced
100
50
0
0 0.5 1 1.5 2 2.5 3 3.5 4
energy
Figure 57: The voiced/unvoiced/silence decision just using the segment power (*) and
number of zero-crossings (o). (The top plot has been normalised to t everything on the
same scale!)
90
5.6 Estimating the pitch period
• If we happen to nd that the speech is voiced, then before we can code it we need
• In long, clean, segments of voiced speech it is quite easy to determine the pitch period
by eye.
• If the speech segment is short, in the presence of noise, it can be dicult to see the
pitch period.
• In this case are numerous ways to to estimate the pitch period some better than
function as:
N −1
1 X
r[k] = y[n]y[n − k].
N − |k|
n=1
Figure 58 shows how the autocorrelation of a voiced speech segment reveals the pitch
period. A simple algorithm to search for the rst peak above a certain threshold can
simply be used to nd the peak due to a lag equal to the pitch period.
Other ad-hoc methods exist, too. One is used in the LPC-10 speech standard. It is called
N
1 X
AM DF (k) = |y[n] − y[n − k]|
N
i=1
• When the lag is that of the pitch period the summation will become small.
91
0.15
0.1
0.05
0
Amplitude
−0.05
−0.1
−0.15
−0.2
0 500 1000 1500 2000 2500 3000
n
−4
x 10
14
12
10
−2
−4
−6
−1000 −800 −600 −400 −200 0 200 400 600 800 1000
k
Figure 58: (Top) A short segment of the phoneme /o/. (Bottom) The autocorrelation of
the speech segment.
92
6 Image Processing
Processing of images is a fundamental concern of engineering. Some specic applications
of interest are:
• Television
• Video
• Remote sensing
• Medical imaging
Like most engineering disciplines, the number of image processing algorithms and trans-
• Image representation
• Image histograms
• Image ltering
• Morphological operations
• For a binary image the matrix will have only f [m, n] = 0 (black) or f [m, n] = 1
(white) entries (c.f. Figure 59). It uses B =1 bits for every pixel. Hence it uses
NM bits.
• If we allow dierent grey levels between black and white we have a greyscale image.
93
Figure 59: 1-bit grey level image
Figure 60: (left) 4-bit image (16 grey levels). (right) 8-bit image (256 grey levels).
94
• Figure 61 shows an image of a man created using M = 11 and N = 8. We've made
the pixels bigger and used less of them. We see that the resolution is too low to be
useful.
• We can also represent colour images by using 3 M ×N matrices for the red, green
and blue intensities. This is called the RGB colour space. Each pixel is represented
by a triplet, e.g. White= (255, 255, 255), Black= (0, 0, 0), etc.
• You may sometimes see a dierent colour space used: the YCbCr colour space. This
Y 65.481 128.553 24.966 R 16
Cb = −37.797 −74.203 112 G + 128 where R, G, B ∈ (0, 1)
• Alternatively, we could represent a colour image with one matrix by using a colour
palette:
Black = 0
Dark red = 1
Light Green =2
etc...
95
6.2 Image histograms
• If we let the variable R represent the various grey-levels in the image f [m, n]. We
can normalise R to lie in the range (0, 1) by dividing each pixel value by 2
B − 1, the
• Therefore the set of grey-levels in the image (corresponding to the pixel brightness
variable, the image may be characterised by the PDF, fR (R = r), of the random
• The PDF, fR (r), can be used in various ways. For example, we can determine the
Z1 M N
1 1 XX
E {R} = rfR (r)dr ≈ f [m, n].
(2B − 1) N M
r=0 m=1 n=1
• This allows us to quantitatively say whether an image is `bright' (e.g. E {R} = 0.8)
or `dark' (e.g. E {R} = 0.2).
• Usually the actual PDF is not available to us and we must either (1) choose a PDF
which best ts the image (Gaussian mixture model, etc.), or (2) we must estimate
pixel brightness, and nally dividing the entire histogram by NM (the number of
pixels), so that the integration of the histogram is 1 (because the area under the
PDF = 1).
• The histogram can also give a good indication of the contrast of an image. If the
histogram is concentrated over a small number of brightness values the contrast will
• In comparison, if the image uses all of the available brighness values the contrast will
• We can manipulate the contrast of the image by transforming the pixel values. We
can use any mapping that sends the pixel values from the domain (0, 1) to the range
(0, 1). We write this transform S = T (R), where T (·) is the transformation function.
S is a random variable which denotes the brightness of the pixels in the transformed
96
Image histogram
800
700
600
# OCCURRENCES
500
400
300
200
100
0
0 50 100 150 200 250
BRIGHTNESS
Figure 62: Sample (un-normalised) histogram for the 8-bit image in Figure 60.
1200
1000
# OCCURRENCES
800
600
400
200
0
0 50 100 150 200 250
BRIGHTNESS
97
Image histogram − high contrast
800
700
600
# OCCURRENCES
500
400
300
200
100
0
0 50 100 150 200 250
BRIGHTNESS
Figure 64: High contrast image and its histogram. Some saturation is evident in the
brighter areas.
0.9
0.8
0.7
0.6
S
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
R
98
In fact, the Cathode Ray Tube (CRT) used in TV sets performs such a transfor-
mation since the electron beam intensity, ICRT is related to the voltage applied,
VCRT by
ICRT = kV γCRT ,
1
γ
VCRT = VIM AGE ,
• From probability theory we can show that, given a random variable X, with a PDF
pX (x), if this variable is transformed to get Y = g(X), for some function g(·), then
fX (x1 ) fX (xn )
fY (y) = + ... + ,
dg(x) dg(x)
dx dx
x=x1 x=xn
where x1 , ..., xn are the roots of the equation g(x) = y . If we assume we are using
a monotonic transformation S = T (R) then there is only one root to the equation
T (r) = s we call this solution r1 . Hence we can write the PDF of the transformed
image as
fR (r1 )
fS (s) = . (11)
dT (r)
dr
r=r1
0 for R < 0.25
S = T (R) = 2(R − 0.25) for 0.25 ≤ R ≤ 0.75
1 R > 0.75
for
We see that for s=0 there are an innite number of solutions to T (r) = s = 0 along the
dT (r)
line from 0 < r < 0.25 . Similarly for s = 1. Therefore in these regions we have
dr =0
which gives
R 0.25
r=0 fR (r)dr
fS (0) =
0
99
1
0.9
0.8
0.7
0.6
0.5
S
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
R
which is meaningless! But we can calculate the PDF in these regions since
0.25
Z
fS (0) = P (0 < R < 0.25) = fR (r)dr = 0.25.
r=0
fR (r)
fS (s) = = 0.5.
2
R∞
So,
−∞ fS (s)ds = 1 as required, however the low intensity and high intensity regions have
been saturated. Meanwhile, the contrast of the middle intensity values has been increased.
Figure 67 shows the result of the above transformation applied to the sample image.
900
800
700
# OCCURRENCES
600
500
400
300
200
100
0
0 50 100 150 200 250
BRIGHTNESS
Figure 67: Image transformed to saturate the high and low intensity regions
means transforming the image PDF so that all intensities provide an equal contribution.
100
This means we want
fS (s) = 1 for 0 ≤ s ≤ 1.
fR (r1 )
fS (s) = where T (r1 ) = s.
dT (r)
dr
r=r1
If we choose
Zr
s = T (r) = FR (r) = fR (w)dw
w=−∞
which is the cumulative distribution function (CDF) for the random variable R, then we
have:
f (r )
fS (s) = R 1
dT (r)
dr
r=r1
f (r )
= R 1
dFR (r)
dr
r=r1
fR (r1 )
=
|fR (r)|r=r1
fR (r1 )
=
fR (r1 )
= 1.
Equalized histogram
800
700
600
# OCCURRENCES
500
400
300
200
100
0
0 50 100 150 200 250
BRIGHTNESS
101
6.3 2-D Fourier transform
6.3.1 Continuous 2D Fourier transform
• Up until now we have been using the 1-D Fourier transform to transform functions
Z∞
F (jω) = f (t)e−jωt dt.
−∞
• However, the Fourier transform is easily dened over many variables. Take two
Z∞ Z∞
F (jωx , jωy ) = f (x, y)e−jωx x e−jωy y dydx
x=−∞ y=−∞
x could denote the x-co-ordinate of an image and y the y -co-ordinate. f (x, y) would
be the grey-level intensity at that point. Here x and y are continuous variables, so
our image would be an analog image (on photographic lm for instance).
• The 1-D Fourier transform is a function of ω. Similarly the 2-D Fourier transform is
ωx and ωy are spatial frequencies, and are measured in units of cycles per meter.
• Figure 69 shows what we might expect the magnitude of the 2-D Fourier transform
x ωx
y ωy
Figure 69: Sketch of the 2-D Fourier for the sample image f (x, y)
102
• Since we are sampling in two dierent directions we have two dierent sampling
distances, Dx and Dy .
• Remember, for a discrete image, f [m, n], m indexes the rows (this is the y direction)
M
X −1 N
X −1
F̃ (ejωx Dx , ejωy Dy ) = f [m, n]e−jωx nDx e−jωy mDy .
m=0 n=0
because the image has already been `windowed' with a rectangular window. We
can use a 2-D Hamming window if we like. It will have the same kind of eect
we spoke about in the Spectral Analysis section: It will reduce spectral leakage,
M
X −1 N
X −1
jωx Dx
F̃ (e ,e jωy Dy
) = 1e−jωx nDx e−jωy mDy
m=0 n=0
−jωx Dx
= 1+e + e−jωy Dy + e−jωx Dx e−jωy Dy
jωx Dx jωy Dy
n jωy Dy jωx Dx jωx Dx
jωy Dy jωx Dx jωx Dx
o
= e− 2 e−
e 2 2 e 2 + e− 2 + e− 2 e 2 + e− 2
jωx Dx jωy Dy n jωy Dy jωy Dy jωx Dx jωx Dx
o
= e− 2 e− 2 e 2 + e− 2 e 2 + e− 2
jω D
− jωx2Dx − y2 y ωy Dy ωy Dy
= e e 2 cos 2 cos
2 2
jωx Dx jωy Dy ω D
y y ω D
y y
= 4e− 2 e− 2 cos cos .
2 2
jωx Dx jωy Dy ωy Dy ωy Dy
|F̃ (e ,e )| = 4 cos cos .
2 2
This repeats every 2π so we only need to plot the range (ωx Dx , ωy Dy ) ∈ [−π, π] × [−π, π].
103
Fourier transform of f[m,n]
2
|F|
0
2
2
0
0
−2 −2
wy wx
1 1
Figure 70: Magnitude of the 2-D Fourier transform of the sequence f [m, n] =
1 1
Figure 71: An image, the log of the magnitude of the Fourier transform and the phase.
Figure 71 shows an image and its Fourier transform. Figure 72 shows the image recon-
structed using just amplitude information of just phase information. We see the importance
response,
N
X −1
y[n] = h[k]x[n − k],
k=0
104
Figure 72: Reconstructed image using just magnitude (phase = 0) or just phase information
(magnitude=constant). We see the importance of the phase information.
We can similarly perform image ltering using 2-D convolution with the lter impulse
response:
M
X −1 N
X −1
g[m, n] = h[u, v]f [m − u, n − v].
u=0 v=0
a DC gain of 4. The resulting image, g[m, n], is smoother and 4 times brighter. Notice
• The result is a blurring of the sharp lines and the attenuation of any high frequency
detail.
• Also, we usually normalise the coecients so the DC gain is unity. Hence we would
use " #
0.25 0.25
h= .
0.25 0.25
105
0.25 0.25
Figure 73: Image before and after low-pass ltering with h= .
0.25 0.25
0 −1 0
can be shown to have a magnitude frequency response:
jωx Dx jωy Dy
H̃(e , e ) = 4 − 2 cos (ωx Dx ) − 2 cos (ωy Dy ) .
This is plotted in Figure over the range (ωx Dx , ωy Dy ) ∈ [−π, π] × [−π, π].
MPEG-4 supports segmentation of the image into dierent regions. Fast chang-
106
Fourier transform of h[m,n]
6
|H|
0
2
2
0
0
−2 −2
wy wx
Figure 74: Magnitude of the transform of the Laplacian operator and the eect on the
image after ltering.
• There are two main reasons why it is preferred ahead of the DFT:
1. It returns only real numbers, whereas the DFT returns complex numbers.
Consider a 1-D time signal. To obtain real coecients we imagining that we are only
• If we take the Fourier transform of this signal all the sine components will be zero,
107
f (t)
T 2T 3T t
f2 (t)
− T2 T
2
3T
2 t
Z∞
F̃2 (ejωT ) = f2 (t)e−jωt dt
−∞
3
T T
e−jω((2n+1) 2 )
X
= f2 (2n + 1)
2
n=−4
3
X T T
= f2 (2n + 1) cos ω (2n + 1) since the function is even,
2 2
n=−4
3
X T
= 2 f (nT ) cos ω (2n + 1)
2
n=0
3
X ωT
= 2 f [n] cos (2n + 1)
2
n=0
So we see that all the F̃2 (ejωT ) is real for all ω. By varying ω over the range [0, ( NN−1 ) Tπ ]
we dene the Discrete Cosine Transform of the sequence f [n] as:
N −1
X (2n + 1)mπ
F [m] = c(m) f [n] cos for m = 0, ..., N − 1,
2N
n=0
where
(
√1 for m=0
c(m) = √N ,
√ 2 for m 6= 0
N
108
are normalising constants to ensure that the signal energy stays the same after a transfor-
N −1 N −1
X X (2m + 1)uπ (2n + 1)vπ
F (u, v) = c(u)c(v) f [m, n] cos cos
2N 2N
m=0 n=0
for
u, v = 0, ..., N − 1
and
(
√1 for m=0
c(m) = √N .
√ 2 for m 6= 0
N
(2m + 1)uπ (2n + 1)vπ
eu,v [m, n] = cos cos .
2N 2N
• This image has 32 × 32 = 1024 pixels, and therefore there are 1024 DCT coecients.
• We notice that most of the energy in the coecients is concentrated around the lower
values of u and v.
• We see that while we are using 10 times fewer bits to represent the image it is still
6.6 JPEG
• The DCT forms the main building of the JPEG (Joint Photographic Experts Group)
compression standard.
This is not to be confused with JPEG2000 which uses a wavelets transform and
109
Figure 76: 64 basis functions for N = 8. For example the basis for (u, v) = (0, 0) is at the
top left.
Figure 77: 32 × 32 image and a compressed version using just 100 of the DCT coecients.
110
• The JPEG standard breaks the image into blocks of 8 × 8 pixels and performs a DCT
on each block.
• The majority of the high frequency coecients are eectively thrown away.
• Figure 78 shows an enlarged picture of a JPEG encoded image. If you look closely
111
Figure 78: Enlarged JPEG encoded image. The 8×8 blocks are just about evident.
112