Signal Processing 1 Script English v2017 PDF
Signal Processing 1 Script English v2017 PDF
LVA 389.166
2
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 3
Preface
The deterministic examination of signals and systems experienced its classical period
in the forties and fifties by the work of Wiener and Küpfmüller, among others. Much
fundamental cognition about linear systems originates from this period. Linear functional
transformations such as Laplace-, Fourier- and z-transform became standard tools in these
times. The main applications lay in the field of filter design, first in the development of
analogue filters, and later they started a renaissance in the design of digital filters.
The sixties were characterized by state space descriptions (Kalman filter), which in
particular led to a paradigm shift in the automatic control engineering. Furthermore, the
FFT (Fast Fourier Transform) was re-invented by Cooley and Tukey in 1965, which did not
know that Gauss used this technique already in 1804.
The eighties then have been shaped by designs of complete filter banks to split up the
signals in small and controllable subunits, according to the well known tradition divide et
impera that goes back to Philip II of Macedonia albeit not in the field of signal processing.
Therewith, the classical themes have been exhausted, however. Innovations now started to
emerge from other fields, namely from scientists who solved their problems with the help of
linear algebra. Therefore, today’s signal processing is a modern discipline, based on linear
algebra, which requires its own methodological skills.
This lecture therefore is structured into a classical part, which subsumes the significant
methods and results of the classical period together with some applications, and a second,
much larger part which deals only with modern methods. Only these modern methods make
it possible to understand such complicated concepts such as stereo equalizer methods, big
data, radio interference suppression by beamforming, finding petrol and sunken treasures
and whatever is possible by other subspace methods.
The lecture also spans the history of algorithmic development over the past two hundred
years where many mathematicians left their mark by their name on the methods. In the
last chapter we finally arrive at newest techniques that are currently being developed.
6 Signal Processing 1
Nomenclature:
In this script, it was attempted to stick to the following nomenclature whenever possible.
∗ denotes the complex conjugate,
T denotes the transpose operator,
H denotes the Hermitian operator, thus the complex conjugate transpose,
a,b,c... describe determined scalars,
a,b,c,... describe random variables with a probability density function
fa (a), variance σa2 and mean ā,
a,b,c... describe (column-)vectors with a number of elements,
1 describes a vector with all elements equal one,
a,b,c... denotes (column-)vectors with random variables as elements,
the joint probability density function is given by fa (a),
the autocorrelation matrix of, e.g., vector a is given by Raa =E[aaH ],
A,B,C,... denotes a matrix of defined dimension, with scalar elements,
A,B,C,... denotes a matrix of defined dimension, with random variable elements,
I denotes the identity matrix of appropriate dimension,
kakqp applied to the vector a indicates the p norm to the power q,
kakqp = ( i |ai |p )q/p
P
p
kakQ applied to the vector a indicates the norm: aH Qa.
If there is an index k (lower case letter) introduced with matrices and vectors, then
k denotes an additional dependency on time. Random variables in this case have to be
interpreted as random processes. If an upper case letter is used as an index, on the other
hand, it mostly denotes the dimension of the vector. Matrices which are positive definite
are often described by the relation A > 0. In particular, this means that all eigenvalues of
A are larger than zero.
Chapter 1
The first property in (1.1) is called homogeneity, the second in (1.2) additive and the
third (1.3) superposition. We denote by S[.] an operation, or an operator, which in this
case is a linear one. Such functions can both be applied to functions (thus time-continuous
mappings x(t)), as well as to series xk 1 . We introduce here the concept of linear operators
very loosely. In Chapter 4 we will exclusively discuss the properties of linear operators and
learn a lot more details.
1
In fact to all objects of a linear vector space, as we will learn later
7
8 Signal Processing 1
1.1.1 Convolution
Familiar linear operators are functional transformations, such as the convolution of contin-
uous functions:
Z ∞ Z ∞
H[x(t)] = h(τ )x(t − τ )dτ = x(τ )h(t − τ )dτ (1.4)
−∞ −∞
Z ∞ Z t
H[x(t)] = h(τ )x(t − τ )dτ = x(τ )h(t − τ )dτ (1.5)
0 −∞
Z t0 Z t
H[x(t)] = h(τ )x(t − τ )dτ = x(τ )h(t − τ )dτ. (1.6)
0 t−t0
The first case (1.4) represents a non-causal system, as typically considered in telecommuni-
cations engineering. In the second case (1.5), the system h(τ ) is causal but with an infinite
memory. Please note the two different notations: the former highlights the causal system
h(τ ), where the range of τ is between zero and infinity. In the latter notation, the signal
x(t), which is present at the input of the system, is in the foreground. It acts upon the
range from −∞ to t, since the system is causal and thus x(t) cannot affect beyond t. In the
third description (1.6) a causal system with finite memory t0 is treated. In this context,
such systems are referred to as FIR filters (FIR = Finite Impulse Response). In FIR filters,
the signal x(t) can only affect respective to the duration from t − t0 to t. Previous values
are irrelevant, they have been forgotten.
In analogy to continuous functions, the convolution can also be defined for series:
X∞ X∞
H[xk ] = hl xk−l = xl hk−l
l=−∞ l=−∞
∞
X k
X
H[xk ] = hl xk−l = xl hk−l
l=0 l=−∞
k0
X k
X
H[xk ] = hl xk−l = xl hk−l .
l=0 l=k−k0
Example 1.1 (Multipath Propagation) The radio wave propagation can be described
by a so-called multipath model. In this context, one imagines multiple (nP ) propagation
paths of the electromagnetic waves, which are attenuated independently from each other
(αn ). The time-delays τn are determined by the geometrical runtime of the signals (waves).
Therefore, a linear impulse response h(τ ) can be assigned to each path. The simplest case
with only one propagation path is thus given by h(τ ) = α1 δ(τ −τ1 ). Because sender, receiver
and scattering objects can move independently of each other, the parameters αn (t) and τn (t)
remain time-dependent, and accordingly the impulse response
nP
X nP
X
h(t,τ ) = hn (t,τ ) = αn (t)δ(τ − τn (t)).
n=1 n=1
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 9
Why is this time-variant system linear? If we examine the response onto the input-signal
x1 (t), we obtain
Z ∞ nP
X
y1 (t) = x1 (t − τ ) αn (t)δ(τ − τn (t))dτ
τ =−∞ n=1
nP
X
= αn (t)x1 (t − τn (t)) = H[x1 (t)].
n=1
Example 1.2 (Linear Prediction) The main applications of linear prediction are within
speech- and image-processing but it is more generally being applied to any strongly correlated
signal. The basic idea is that correlated random processes can be predicted up to a specific
amount. An obvious approach to predict a signal is the linear combination (interpolation,
filtering) of past signal-values:
nP
X
x̂k = an xk−n .
n=1
Source-coding methods utilize this to transmit only the error ek = xk − x̂k instead of the
original signal xk , because the error can be coded with considerably less bits.
Is this a linear system? The answer is: no. This is because the choice of the optimal
prediction-coefficients depends on the statistic of the signal xk . If we consider two input-
signals xk,1 and xk,2 with different statistics, the corresponding output signals yk,1 and yk,2
will be different from the output-signal matching the input signal xk,1 + xk,2 . See also
Example 3.6 for the computation of the computation of the prediction coefficients an .
10 Signal Processing 1
The operator description from Section 1.1 avoids this problem elegantly. Linear filters
are indicated by upper case letters, and typically, if operations on time-discrete series are
meant, the additional argument q −1 is used to make the notation unique. The simplest
operator q −1 denotes: q −1 [xk ] = xk−1 .2
Example 1.3 (IIR-filter operator) For example we write B(q −1 )[vk ] = B[vk ] to denote
a series vk which is filtered with the filter given by the coefficients b0 ,b1 ...bnB −1 . Also recur-
B(q −1 )
sive filter structures can accordingly be denoted very easily: 1−A(q −1 ) [vk ] describes a recursive
Although very similar to the z-transform, we will prefer this notation, because it does
not require an (energy-, Dirichlet) constraint at the input-sequence, in contrast to the z-
transform. At a first glance, z-transforms and notations in q −1 appear similar. But note
that they are fundamentally different, as z is a complex-valued number, while q −1 describes
an operator. As a consequence, z −1 can be multiplied to any transfer function, say H(z),
while q −1 operates on (time-)sequences hk . Operations like z ∗ or |z| make no sense in q −1 ,
or need to be defined there first.
This description in terms of q −1 of time-discrete series or systems is also called polyno-
mial description, because in most cases finite polynomials in q −1 arise.
H[.] = h0 + h1 q −1 + h2 q −2 + ... + hn q −n ;causal system
n 1 −1 −n
H[.] = h−n q + ... + h−1 q + h0 + h1 q + ... + hn q ;non-causal system.
The polynomial description allows for a simple mathematical description:
G(q −1 )H(q −1 ) = g0 + g1 q −1 + .. + gnG q −nG h0 + h1 q −1 + .. + hnH q −nH
Note that multiplying polynomials is commutative, i.e., the order can be changed.
Since this is an equivalent description of a convolution, this is true for any convolu-
tion as well and also for other equivalent descriptions of linear time-invariant systems,
such as Toeplitz or circulant matrices as we will learn later on (see in particular Chapter 5).
H[.] = H(q −1 ) = 1 + aq −1 = h0 + h1 q −1 .
If a sequence xk is the input of such a system, its output is yk = xk + axk−1 . If the sequence
xk can be transformed into the z-domain (because it satisfies the Dirichlet conditions):
∞
X
X(z) = xk z −k ,
k=−∞
Y (z) = H(z)X(z)
with H(ejΩ ) = 1 + ae−jΩ . If we are interested for its stability behavior, we can check the
existence of the system H in the complex plane:
H(z) = 1 + az −1 = z −1 (z + a).
We find that for all but z = 0 finite values exist and thus the ROC is given by all values of
z except zero. The zero of the z-transfer function H(z) = 0 delivers z0 = −a.
Let us now consider a second system that describes the input-output relation
yk = ayk−1 + xk .
The impulse response of such system delivers hk = ak ; k = 0,1,.... As long as |a| < 1 we
thus have a stable and causal system. But what about z? For which values of z is this
defined? For this to solve we consider the z-transform again
∞ ∞
X X a k
H(z) = ak z −k = .
k=0 k=0
z
We know that this sum can only converge as long as |a/z| < 1. We thus conclude that
|z| > |a|. The stability of such system is given by its poles. We can compute the poles of
this system by setting the denominator to zero and find z∞ = a. Thus, as long as the poles
remain inside the unit circle, we have a stable and causal system. We then can use all
values of z larger than a. This includes also the values on the unit circle z = ejΩ , offering
us to compute the Fourier behavior of the system.
In both systems the operator description did not require any restriction on its use, while
for applying the z-transform we have to ensure that the signals are satisfying the Dirichlet
conditions, i.e., they are sufficiently bounded and that we only use values for z for which
the expressions exist.
Example 1.4 (Zero-Forcing Equalizer) Consider a linear transfer function in form of
a polynomial H(q −1 ). How does a polynomial G(q −1 ) look like, such that G(q −1 )H(q −1 ) =
q −D , i.e., that the linear distortion of the filter H can be compensated? Such problems
often arise, e.g., in the transmission over a static multipath radio propagation channel, or
in the transmission of the speaker signals (echoes) to the human ear. Because the transfer
function G(q −1 ) of the receive filter ought to reconstruct the original signal, G(q −1 ) is called
an “equalizer”. Without solving the problem, one easily sees that there will not always be
a solution. If H(q −1 ) has zeroes, those cannot be compensated. Later on (see treatment
of minimum phase systems in Section 1.2.5) we will learn that this problem has in general
only a double infinite solution, thus a non-causal G(q −1 ). To illuminate this a bit more,
investigate a channel with the transfer function H(q −1 ) = h0 + h1 q −1 . The problem is then
given by:
H(q −1 )G(q −1 ) = h0 + h1 q −1 g0 + g1 q −1 + .. + gnG q −nG
This is a system of equations with two unknowns and three equations. If the number of
degrees of freedom is increased, from nG = 1 to nG = 2, one obtains a system with three
unknowns and four equations. By induction it can be seen that there is always one equation
more than unknowns, showing that the system of equations typically does not lead to a
solution.
A solution for this difficult problem only appears if more than one channel H is incor-
porated.
Theorem 1.1 (Bezout) Consider the transfer functions H1 (q −1 ),...,HnH (q −1 ) to be given.
Then, the equation
XnH
Hn (q −1 )Gn (q −1 ) = q −D (1.7)
n=1
has a finite dimensional solution G1 (q −1 ),...,GnH (q −1 ) if and only if (iff ) the polynomials
H1 (q −1 ),...,HnH (q −1 ) do not show any joint zeroes, i.e., they have to be co-prime.
Proof:
The proof (for polynomials of finite dimension) will proceed in two steps. In the fist step
(Example 1.5) we show that if the polynomials have a common zero, no solution can exist.
In the second step we show that if they have no common zero, then there is at least one
solution.
Example 1.5 Let us consider two transfer functions H1 and H2 for which the following
holds: H1 (q −1 ) = (1 + hq −1 )C1 (q −1 ) and H2 (q −1 ) = (1 + hq −1 )C2 (q −1 ). Here, C1 (q −1 ) and
C2 (q −1 ) should not share any zeroes. To determine the solutions G1 (q −1 ) and G2 (q −1 ), the
following must hold:
q −D = H1 (q −1 )G1 (q −1 ) + H2 (q −1 )G2 (q −1 )
= (1 + hq −1 )C1 (q −1 )G1 (q −1 ) + (1 + hq −1 )C2 (q −1 )G2 (q −1 )
= (1 + hq −1 )(C1 (q −1 )G1 (q −1 ) + C2 (q −1 )G2 (q −1 )).
Let us now turn to the second step of the proof. Let us assume that the conditions of
the theorem are fulfilled. How does the previous example alter? If we have the two transfer
(1) (1) (2) (2)
functions H1 (q −1 ) = h0 + h1 q −1 and H2 (q −1 ) = h0 + h1 q −1 , we can solve the problem
(1) (1) (2) (2)
via two equalizers: G1 (q −1 ) = g0 + g1 q −1 and G2 (q −1 ) = g0 + g1 q −1 . Then, we obtain
the equation:
H1 (q −1 )G1 (q −1 ) + H2 (q −1 )G2 (q −1 ) = q −D .
14 Signal Processing 1
For D = 1, we find:
(1)
(1) (2)
g0
h0 h0 (1) 0
g1
(1) (1) (2) (2)
h1 h0 h1 h0 = 1 .
(2)
(1) (2) g0
0
h1 h1 (2)
g1
In contrast to the previous example we now have four unknowns, but only three equations.
This means that the system of equations can be solved now, strictly speaking even infinite
solutions exist. If we increase the number of free parameters from two to three, we obtain
six unknowns and five equations, and so on. By induction we can therefore show that we
always have more unknowns than equations. This proves Bezout‘s Theorem.
It also lets the question open which of the infinite amount of solutions is to be taken.
The final answer to this problem will be provided in the context of the so-called minimum
norm solution in Chapter 3, see also Example 3.11.
H −1
nX k
X
H[xk ] = hn xk−n = xn hk−n
n=0 n=k−nH +1
−1
+ ... + hnH −1 q −nH +1 xk = yk
= h0 + h1 q (1.8)
yk = hT xk = xTk h (1.9)
hT = [h0 ,h1 ,...,hnH −1 ]
xTk = [xk ,xk−1 ,...,xk−nH +1 ].
We choose the representation in column vector form which we will hence always use. In
the strict sense, the series in (1.8) is the response for the input-series xk , whereas (1.9)
represents a single value as the response for a composition of input values xk . Equation
(1.9), however, can also be interpreted as an output-series yk as response for the series of
input-vectors xk .
If we do not want to consider a single value yk but a sorted set y0 ,...ynK−1 of values, the
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 15
Note that the occurring matrix is of dimension nK × nK and due to its structure the
matrix is regular and thus invertible (h0 6= 0). The description allows also for a backward
calculation of the input-signal from the output-signal. The short notation for this is: y =
Hx. Furthermore, this description includes the initial values since we start with the time-
index k = 0. Disadvantageous on the other hand is the property that the matrices (and
vectors) grow with increasing time-index. Note that often the reverse order of the vectors
is applied, resulting in an alternative representation:
h0 h1 . . . hnH −1 xnK −1
ynK −1 ...
h0 h1 xnK −2
yn −2
K h h x
yn −3 0 1 n
K −3
K .. ..
= (1.11)
.. h0 . hn −1
.
.
..
H
y1
. h1
x2
x1
h h
y0 0 1
h0 x0
The short notation for this is: y (r) = H T x(r) , r denoting backward or reverse notation.
The vector h and thus each of the lower rows of H contains the same information.
Because the initial values are often not of interest in communications engineering, the band
matrix description presents itself as an alternative presentation of the problem.
x k−nN −nH +2
yk−nN +1
yk−n +2 h n H −1 ... h1 h 0
xk−n −n +1
N H
N . .. . .. . .. .
..
..
=
.
yk−1
hnH −1 ... h 1 h0
x k−2
hnH −1 ... h1 h0 xk−1
yk
xk
or in short notation y k = HnN xk , where the index nN denotes the dimension of the (nH +
nN + 1) × nN matrix HnN . The band matrix description is very beneficial if several linear
16 Signal Processing 1
systems are connected serially, because then again band matrices emerge. However, these
matrices are then not invertible anymore because they have a rectangular form, i.e., it is
not easily possible to calculate the input-signal from the output-signal. This is not very
astonishing given that there are no initial values in this description.
Example 1.6 Let us now consider the transmission of a training sequence at the beginning
of a data block. The training sequence is six symbols long and is sent through a channel
with transfer function h0 + h1 q −1 . The first description yields:
y0 h0 x0
y1 h1 h0 x1
y2 h1 h0
x2
y3 = h1 h0 x3 .
y4 h 1 h 0
x4
y5 h1 h0 x5
y6 h1 h0 x6 = 0
It also contains the initial-values. If the training sequence is only seen as a part of an
infinitely long set of symbols, then the initial-values will be irrelevant. In that case the
following description may be more appropriate:
xk
yk+1 h1 h0
xk+1
yk+2 h1 h0
xk+2
yk+3 = h1 h0 xk+3 .
yk+4 h1 h0
xk+4
yk+5 h1 h0
xk+5
−1 b0 + b1 q −1 + .. + bnB −1 q −nB +1
H(q ) =
1 + a1 q −1 + ... + anA q −nA
b0 + b1 q −1 + .. + bp q −p
= . (1.12)
1 + a1 q −1 + ... + ap q −p
The name canonical is due to the fact that this description is a basic structure that can, by
variations, be combined to new complex units. This is somehow similar to the canon in the
musical theory, which is varied from strophe to strophe. In order not to handle two memory
variables nA , nB , we chose p = max{nA ,nB − 1}. Here, either ap or bp can be zero, but
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 17
not both, and hence p determines the order of the filter. A Finite Impulse Response (FIR)
filter is thus a special case for a1 = a2 = ...ap = 0. The following Figure 1.1 clarifies this
connection in form of a signal flow diagram. The outputs of the delay chain are denoted
b0
b1
bp-1
xk z(p-
k
1)
yk
q-1 q-1 q-1 q-1 bp
a1 z(p)
k
ap-1 z(k2)
ap z(k1)
(p)
zk , which are the states of the system. If the states are known, the output signal yk can
be calculated for every input-signal xk . Accordingly, the state values adopt the role of the
initial values, not only for the origin k = 0, but for arbitrary time instances k. Thus, the
whole effect of the input sequence up to time k is accumulated in the system states. From
the figure, we can determine the following relations:
In both equations the states appear. Let us combine the states at time instance k in a
vector z k . Then, we obtain the much more compact description:
(1) (1)
zk+1 1 zk 0
z (2) . (2)
.. zk 0
k+1
. = . + . xk
.. 1 .. ..
(p)
zk+1 −ap −ap−1 ... −a1 (p)
zk 1
| {z } | {z } | {z } | {z }
z k+1 A zk b
(1) (1)
zk zk
(2) (2)
T
zk zk
yk = [bp ,bp−1 ,...,b1 ] + b0 xk − [ap ,ap−1 ,...,a1 ]T
.. ..
. .
(p) (p)
zk zk
(1)
zk
(2)
zk
= [bp − b0 ap ,bp−1 − b0 ap−1 ,...,b1 − b0 a1 ] + |{z}
b0 x k .
..
| {z } .
cT d
(p)
zk
And this can be written compactly as:
z k+1 = Az k + bxk (1.15)
yk = cT z k + dxk .
Naturally, the question regarding the relation to the transfer function arises. If the
transfer function is given in the form of (1.12), then it can be transformed into the form
(1.15). Else, starting from (1.15) and by applying the z-transform, one can obtain:
Z[z k+1 ] = Z(z)z = AZ(z) + bX(z) (1.16)
Y (z) = cT Z(z) + dX(z). (1.17)
The first Equation (1.16) can be solved for Z(z), leading to:
Z(z) = (zI − A)−1 bX(z).
That can be inserted into the second Equation (1.17), finally resulting in:
Y (z) = cT (zI − A)−1 bX(z) + dX(z) = (cT (zI − A)−1 b + d) X(z).
| {z }
H(z)
A first advantage of the state-space description is thus that we obtain a very compact
description:
z k+1 A b zk
= .
yk cT d xk
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 19
Example 1.7 Let us consider the signal flow diagram in the left part of Figure 1.2. We
can identify a simple feedback, as it is, e.g., used to average signals. We want to find the
canonical structure in the state-space form. The input-output relations are:
yk = ayk−1 + dxk .
By applying the z-transform, we obtain
d
Y (z) = aY (z)z −1 + dX(z) = X(z).
1 − az −1
Accordingly, we can determine: a1 = −a, b0 = d. The canonical structure of the IIR filter
can be seen in the right part of Figure 1.2. The state-space description is then given by
zk+1 = xk − a1 zk
yk = b0 zk+1 .
In more compact notation, we obtain:
zk+1 −a1 1 zk a 1 zk
= = .
yk −b0 a1 b0 xk da d xk
yk yk
b0
xk yk-1 xk
d q-1 q-1
-a a1 zk
Figure 1.2: Signal-flow diagram of an elementary feedback system, left: arbitrary signal-flow
graph, right: corresponding, canonical description.
It is relatively easy to determine the eigenvalues from this companion form, and thus the
dynamic behaviour of the system. Note that the state-space description is not unique.
We only considered the canonical structure, but in principle infinite forms are possible,
which all access inner states to describe the same system. Other representations of the
same system can be found by affinity transformations. A similarity transformation is given
by any regular matrix T , i.e., its inverse T −1 exists. Then, the input and output can
uniquely be transformed, so that this path can also be followed backwards. We exchange
A = T A0 T −1 and multiply in (1.15) with T −1 from the left:
T −1 z k+1 = A0 T −1 z k + T −1 b xk . (1.18)
| {z } | {z } | {z }
z 0k+1 z 0k b0
To be able to describe the output in the transformed state z 0k , we also modify the output-
equation:
yk = cT zk + dxk = cT T zk0 + dxk .
|{z}
c0T
Hence, if the transformation-matrix T is known, all involved variables can easily be con-
verted:
A0 = T −1 AT ; b0 = T −1 b; c0 = T T c; d0 = d. (1.19)
Notably advantageous are transformations T in which A0 shows a diagonal structure (or a
Jordan structure, see also Equation (4.1) and the discussion there), because then its eigen-
values can directly be determined. Accordingly, the question remaining is which matrix T
transforms the matrix A into a matrix D of diagonal structure. The answer is the following:
from (1.19) we know that T D = AT must hold. Hereunto we find the Vandermonde matrix
1 1 ... 1
λ1 λ2 ... λm
T = ..
..
. .
m−1 m−1 m−1
λ1 λ2 ... λm
which diagonalises the companion form. If all eigenvalues are different then the matrix is
regular and thus invertible. Note, however, that we need to know the eigenvalues first in
order to construct this matrix.
Example 1.8 Let us expand our averaging filter from the previous example by one pole, so
that we obtain
yk = −a1 yk−1 − a2 yk−2 + dxk .
By applying the z-transform, we obtain the canonical form
d
Y (z) = X(z)
1 + a1 z −1 + a2 z −2
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 21
Via the state-space solution (1.20), also the solution for the output of the time-variant
system can be given in a closed-form expression:
y k = Ck z k + Dk xk
k−1
X
= Ck Φ(k,j)z j + Ck Φ(k,l + 1)Bl xl + Dk xk .
l=j
For the special case of a time-invariant system {A,B,C,D}, for the transition matrix,
the following must hold
Φ(k,j) = Ak−j , k ≥ j.
Under special circumstances the time-invariant system can be diagonalized. Then
A = diag{λ1 , λ2 , ..λm }.
Example 1.9 (cable) Let us first consider a linear, time-invariant system H[.] with the
FIR impulse response h (e.g., a static radio transmission channel) and additive interference
wk . Then, we obtain:
z k+1 = Iz k + 0wk ; z0 = h
yk = xTk z k + wk .
Please note that the static channel can very easily be described by a simple state equation.
Example 1.10 (wireless) Now let us consider another, advanced example with a time-
variant transfer function, thus a typical mobile radio channel, which changes over time:
z k+1 = Az k + 0wk ; z0 = h
yk = xTk z k + wk .
The only change in comparison to the previous example is the introduction of the matrix
A, which now describes the dynamic behaviour of the radio channel. Please note that a
time-invariant notation is sufficient for the description of this time-variant radio channel.
Of course, this can not be generalized. Certain time-variant channels require a description
by time-variant state-space equations. Nevertheless, one recognizes in this example that the
state-space description allows for descriptions which have not been possible before.
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 23
The reader may argue that the channel will fade now if all eigenvalues of A are inside
the unit circle but not return. Such behavior can also be described by adding a driving force
for the channel:
z k+1 = Az k + 0wk + v k ; z 0 = h
yk = xTk z k + wk + 0v k .
Now the second noise source v k drives the channel back and lets it fluctuate. It is not a
direct noise source for the channel, which is described by wk .
2 ejΩ − 1
2 Ω
jω = jΩ
= j tan , (1.21)
T e +1 T 2
ωT
Ω = 2atan .
2
T dω
dΩ = 2 2 .
1 + T 4ω
The time parameter T corresponds to the sampling time interval, assuming equidistant
sampling. Such transformation is a special case of the so called Möbius transform3 .
3
August Ferdinand Möbius (1790–1868), German Mathematician. In the English control literature it is
often referred to as the Tustin’s method, after Arnold Tustin, (16 July 1899 - 9 January 1994), a British
engineer.
24 Signal Processing 1
Example: Consider an analog low pass realized by an RC filter (voltage divider). Its
transfer function in the Laplace domain reads
1
H(s) = .
1 + sRC
Its impulse response is given by
t
1 − RC
e ;t ≥ 0
h(t) = RC .
0 ;else
This is obviously an exponential decaying function which can be described by
1 −k T 1 − T k 1 k
h(kT ) = e RC = e RC = a = hk , for k = 0,1,....
RC RC RC
The z-transform of such sequence is given by
∞ ∞
X
−k 1 X a k 1 1 1 1 1 1
H(z) = hk z = = a = −1
= T .
k=0
RC k=0 z RC 1 − z
RC 1 − az RC 1 − e RC
−
z −1
2 z−1
Alternatively, let us now apply the bi-linear transform to H(s), i.e., s = T z+1
. We obtain
1 z+1 1 1 + z −1
HB (z) = = =
1+ 2RC z−1
T z+1
z + 1 + 2RC
T
(z − 1) 1 + 2RC
T
1 + TT −2RC
+2RC
z −1
which is indeed a very different z-transform and thus also a different impulse response than
the previous one. The corersponding impulse response reads:
! k
1 1 T − 2RC
hk = − − ; k = 0,1,....
1 + 2RC
T
1 − 2RC
T
T + 2RC
RC
In Figure 1.3 we show this example for T
= 4 with T = 1. The smaller T is compared to
RC, the better the match.
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 25
0.3 1.2
hk
0.25 1
h B,k
h(t)
0.2 0.8
h(t),hk ,h B,k
0.15 0.6
0.1 0.4
0.05 0.2
0 0
0 10 20 -2 0 2
t=kT
We call such a system bounded-input bounded-output (BIBO)-stable if and only if <{si } < 0.
We make use of the unit-step function here4
0 ;t < 0
1
U (t) = ;t = 0 .
2
1 ;t > 0
BIBO stability refers to the property that the input amplitudes does not exceed a certain
value: max |x(t)| ≤ A < ∞, and correspondingly the output: max |y(t)| ≤ B < ∞.
4
Note the subtlety for t = 0.
26 Signal Processing 1
Proof: Consider the output y(t) of a linear system h(t) from the input x(t):
Z ∞
y(t) = h(t) ∗ x(t) = h(t − τ )x(τ )dτ
0
Z ∞
|y(t)| ≤ max |x(t)| h(τ )dτ
0
Z ∞ p
X
≤ max |x(t)| di esi τ dτ
0 i=1
p
∞
X di
si τ
≤ max |x(t)| e
i=1 i
s
0
p
X di s ∞
≤ max |x(t)| |e i − 1| .
si
i=1
The last term esi ∞ can only be bounded if the real part of si is negative.
How about the stability of a time-discrete system? We simply apply the bi-linear transform
and find for <{s} < 0:
|z|2 − 1 < 0,
thus z needs to lie inside the unit circle for stability of a causal system.
Theorem 1.3 (Stability of closed loop) The closed loop system is BIBO stable if the
open loop system has:
max |H(jω)G(jω)| < 1.
ω
Y (jω) H(jω)
=
X(jω) 1 + G(jω)H(jω)
As long as |G(jω)H(jω)| < 1, the system behaves stable which concludes the proof. Note
however, that even values larger than one can result in stable systems. Linear time-invariant
systems of such form allow very precise statements on stability (if and only if) by just
analyzing the open loop.
1.2.2 Causality
In principle, we know the relation between a continuous function and its Fourier-transform:
Z ∞
F (jω) = f (t)e−jωt dt
−∞
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 27
The complex-valued function F (jω) can be separated into magnitude and phase5 :
is an odd function6 .
A system is called causal, if according to an excitation at t = t0 , no responses of the
system prior to t0 occur. In the impulse response h(τ ), this is determined by h(τ ) = 0 ∀ τ <
0, thus the impulse response has to be dexter (Ger.: rechtsseitig). Let us first investigate a
general continuous function f (t). Such a function can always be decomposed into an even
and an odd part:
1
fe (t) = [f (t) + f (−t)]
2
1
fo (t) = [f (t) − f (−t)] .
2
Note that for causal systems with t > 0, the following must hold: f (t) = 2fe (t) = 2fo (t).
5
Please note that we use a description with positive phase. Some authors use the negative form e−jφ(ω)
which accordingly leads to different terms with alternate signs.
6
Often in literature the function arc(x) is used to compute the phase of a complex-valued argument x
28 Signal Processing 1
In general the Fourier-transform can be divided into its real and imaginary part:
Z ∞
F (jω) = f (t)e−jωt dt
Z−∞
∞
= (fe (t) + fo (t))e−jωt dt
Z−∞
∞
= (fe (t) + fo (t)) {cos(ωt) − j sin(ωt)} dt
−∞
Z ∞
= (fe (t) + fo (t)) cos(ωt) − j(fe (t) + fo (t)) sin(ωt)dt
−∞
Z ∞
= fe (t) cos(ωt) − jfo (t) sin(ωt) + fo (t) cos(ωt) − jfe (t) sin(ωt) dt
−∞ | R
{z }
=0
This description clearly shows that with causal, linear systems it is utterly sufficient either
to know the real or the imaginary part. Both are uniquely connected. This conclusion can
be formulated more clearly by converting the two equations.
To show this, let us start with the trivial equality for causal functions: f (t) = f (t)U (t)+
1
2
We added here the single point f (0) since the step function U (0) = 12 . Computing
f (0)δ(t).
the Fourier transform on both sides, we obtain
1 1 f (0)
FR (jω) + jFI (jω) = F (jω) ∗ πδ(ω) + +
2π jω 4π
1 1 1 f (0)
= F (jω) + F (jω) ∗ +
2 2πj ω 4π
1 1 f (0)
= F (jω) ∗ +
πj ω 2π
Z ∞ 0
2 FI (jω ) 0
FR (jω) = dω + FR (∞)
π 0 ω − ω0
2 ∞ FR (jω 0 ) 0
Z
FI (jω) = − dω .
π 0 ω − ω0
The last two equations show that this is a convolution in the Fourier-domain with the
function 1/ω. This corresponds to a multiplication with the sign function in the time-
domain which is exactly the operation that is necessary to link even and odd part with
each other:
fe (t) = fo (t)sgn(t) + f (0)δ(t).
The above description with the convolution in the Fourier-domain is also called the Hilbert-
transform of a function. The real part requires an additional term,
2 ∞ FI (jω 0 ) 0
Z
FR (jω) = FR (∞) + dω , (1.23)
π 0 ω − ω0
due to the singular point in the unit step function (also in the sign function). Note
that the previous propositions are strictly speaking only valid for t > 0. For t = 0,
a separate investigation is needed. Because fo (t = 0) = 0 must hold, but fo (t) and
FI (jω) are uniquely coupled, f (0) cannot be determined out of FI (jω). For this, we
need a separate term, which occurs in (1.23) according to the final value theorem as FR (∞).
Further alternative formulations exist, but we only want to mention this one:
2ω ∞ FR (jω 0 ) 0
Z
FI (jω) = − dω
π 0 ω2 − ω02
2 ∞ 0 FI (jω 0 ) 0
Z
FR (jω) = FR (∞) + ω 2 dω .
π 0 ω − ω02
30 Signal Processing 1
Hence we now know that real and imaginary parts of the Fourier-transform of causal
systems are connected by the Hilbert-transform. Now the question arises whether there
exists a similar relation between the magnitude and the phase. It turns out that this is not
the case, but they are also not entirely independent of each other. A connection, however
in a very different form, is given by which is described as the Paley-Wiener theorem.
Theorem 1.4 (Paley-Wiener) Let H(jω) = A(ω)ejφ(ω) . If for an even, non negative
magnitude function A(ω) with energy-constraint
Z ∞
A2 (ω)dω < ∞
−∞
is fulfilled. This proposition directly follows, if the bi-linear transform (1.21) is applied. We
present here a normalized form in which the term T /2 of the bi-linear transform is simply
omitted. As we only test whether the terms are finite or not, scaling with a fixed constant
does not change the result. Please note that the conclusion of the theorem of Paley-Wiener
explains the existence of a phase-function, but not its uniqueness. To see that, think of a
valid function φ(ω) and add a delay e−jωT : then φ̃(ω) = φ(ω)e−jωT is also a valid phase
function.
Likewise, note that the theorem includes a very strong statement which is also invertible:
Iff the system H(jω) is causal, then the previous theorem holds. In practice, the theorem
is often difficult to apply. If for example a magnitude function A(ω) is given, for which it
was successfully tested by applying the theorem that it is a causal system, still the problem
remains how to find H(jω), e.g., in the form A2 (ω) = H(jω)H ∗ (jω). This is not a trivial
problem and the question will confront us in the next chapter (see Example 2.13).
Example 1.11 Consider a low-pass filter of first order with the transfer function
1
HLP (jω) = .
1 + jωc
Thus, for the magnitude function, we have
1
ALP (ω) = √ .
1 + ω 2 c2
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 31
Example 1.12 If we examine the magnitude function of an ideal low-pass filter, i.e.,
(
1 ; |ω| < ωg
AIL (ω) = ,
0 ; else
The first term is zero, because ln(1) = 0, but the second term is not finite, since ln(0) = −∞.
hence we conclude that the ideal low-pass filter is not a causal system and thus not realizable.
In principle, this is the case for all ideally band-limited systems.
f (t)ejωo t ⇔ F (j(ω − ωo ))
1
f (t) cos(ωo t) ⇔ {F (j(ω − ωo )) + F (j(ω + ωo ))}
2
f (t − to ) ⇔ F (ω)e−jωto
f (t − to )ejωo t ⇔ F (j(ω − ωo ))e−j(ω−ωo )to .
32 Signal Processing 1
For a narrow-band system, i.e., a system for which the considered bandwidth is small
compared to the reciprocal signal-duration (∆B∆T < 1), let us assume that the magnitude
function A(ω) = A(ω0 ) is constant and located at the centre frequency ωo between ωo − ωg
and ωo + ωg . Only the phase function is frequency-dependent. For that, we assume a Taylor
series which we truncate after the linear term:
H(jω) = A(ω)ejφ(ω)
j[φ(ω )+(ω−ω )φ0 (ω )]
e o 0 0
; |ω − ω0 | < ωg
≈ A(ωo ) j[φ(−ωo )+(ω+ω0 )φ0 (ω0 )]
e ; |ω + ω0 | < ωg
j[φ(ω )+(ω−ω )φ0 (ω )]
e o 0 0
; |ω − ω0 | < ωg
= A(ωo ) −j[φ(ωo )−(ω+ω0 )φ0 (ω0 )]
e ; |ω + ω0 | < ωg
We now examine a narrowband excitation signal X(jω) which is transmitted over the given
narrowband system. The following Figure 1.4 clarifies the connection. In the upper picture,
we identify the system-characteristics, given by a constant transmission around the centre
frequency ωo . In the lower picture, the transmitted signal X(jω) is shown which has been
shifted to the centre frequency by a frequency converter (mixer). We now calculate the
!
{!0 !0
X(j!)
X(j(!+!0))/2 X(j(!{!0))/2
!
{!0 !0
output-signal Y (jω):
n 0
A(ωo )
Y (jω) = 2
X(j(ω − ωo ))ej(φ(ωo )+(ω−ωo )φ (ωo ))
0
o
+X(j(ω + ωo ))e−j(φ(ωo )−(ω+ωo )φ (ωo )) .
We identify two main parameters, which describe the behaviour of the narrowband system
fundamentally:
dφ(ω)
TG = − dω
= −φ0 (ωo ) ;Group delay
ω=ωo
TP = − φ(ω
ωo
o)
;Phase delay.
Note that in the literature also definitions with a positive sign (instead of negative sign
used in this lecture notes) occur. This results from the definition A(ω)e−jφ(ω) of the model,
instead of our definition, A(ω)ejφ(ω) . Then the group delay (Ger.: Gruppenlaufzeit) and
the phase delay (Ger.: Phasenlaufzeit) become negative. The narrowband description has
the advantage that the (considerably costly) convolution is not needed. The transmission
behaviour is sufficiently described by easier operations.
The group delay TG thus indicates how fast a small group of energy passes the system
around ωo . If groups pass at different frequencies ωo with different velocities, the signal
becomes distorted. To obtain an undistorted (and only delayed) signal, a constant group
delay has to be required: the system has to have a linear phase such that the derivative is
constant.
The integer constant ∆ > 0 is chosen such that a causal system is formed. We then
recognize that A(ω) = G(ejΩ )G(e−jΩ ) and φ(Ω) = −∆Ω. Thus, it is a system with linear
phase, independent of the choice of G(ejΩ ).
Let us therefore assume that G(ejΩ ) = 1 + ae−jΩ , corresponding to the series gk = {1,a}
together with ∆ = 1. Then the following holds:
The impulse response of the filter thus is: hk = {a,1 + a2 ,a}. This means that it is a
symmetric filter, and this also holds true, independently of the choice of G(ejΩ ). Hence, we
conclude that symmetric filters which are constructed according to this rule, are always of
linear phase8 .
8
Does the opposite also hold? Find out yourself. See Example 3.16
34 Signal Processing 1
Definition 1.3 If for some input signal the energy at the output is bigger than at the input,
the system is called active:
max |H(ejΩ )| > 1.
Ω
See also the discussion of system gains in the context of norms in Chapter 2.
The gain can be measured, for example, if the ratio of input- and output-energy is examined:
y Tk y k
PK 2
k=0 yk
?
PK 2 = < 1.
k=0 xk
xTk xk
Next to passive and active systems there exists also a limit case, that is, a system that is
not active nor passive. We cal this an allpass. For an allpass the method of comparing input
and output energy has to be questioned as it needs to be checked whether this property
is fulfilled for all signals xk , or just for some. Apparently, this description seems not very
practical, since all possible signals would have to be tested.
A better answer is delivered by the classical approach in the Fourier-domain. Let us
have a look at the following example.
Example 1.14 A linear, time-invariant, time-discrete system is given by its transfer func-
tion: Y (ejΩ ) = H(ejΩ )X(ejΩ ).
Now let us assume that the maximum (minimum) of the transfer function occurs at a
specific frequency Ω+ (Ω− ). Then, all signals X(ejΩ ) with spectral components unequal to
Ω+ (Ω− ) will come out of such system with less (more) energy than components at Ω+ (Ω− ).
To ensure the same energy at the output and the input, the following must hold:
Y (ejΩ ) 2
= max H(ejΩ )2 = min H(ejΩ )2 = 1.
max
X(ejΩ )6=0 X(ejΩ ) Ω Ω
If a system emits exactly the same amount of energy than it gathers, and thus no energy
is added or consumed, it is called an all-pass.
Definition 1.4 A linear time-invariant (finite order) system is called allpass, if the follow-
ing holds:
min |H(ejΩ )| = max |H(ejΩ )| = 1.
Ω Ω
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 35
Example 1.15 Let 1/H ∗ (ejΩ ) be a causal, stable system. Then consider the following
linear time-invariant system:
H(ejΩ )
G(ejΩ ) =
H ∗ (ejΩ )
A(Ω)ejφ(Ω)
=
A(Ω)e−jφ(Ω)
= ej2φ(Ω) .
Theorem 1.5 The poles and zeroes of an allpass system are symmetrical with respect to the
unit circle. The phase of an allpass is monotonically decreasing, or equivalently, φ0 (Ω) < 0.
H(z) = −a∗ + z −1
z − a1∗ za∗ − 1
G(z) = a∗ =
z−a z−a
with a pole at z = a and a zero at 1/a∗ . Its Fourier-transform reads:
ejΩ a∗ − 1 jΩ e
−jΩ
− a∗
G(ejΩ ) = = −e
ejΩ − a ejΩ − a
The magnitude of this is
|G(ejΩ )| = 1
and thus we have an allpass. Thus, for all polynomial combinations of zeroes and poles we
can only achieve the allpass property if we have a match of zero and pole (symmetrical
with respect to the unit circle means: pole at a, zero at 1/a∗ ). Concatenating higher
order systems provides the same result. Note that this is a property of an allpass, thus a
necessary condition, not a sufficient condition. For example, just multiply the allpass by
a constant (not equal to one) and it is not an allpass any more, but the poles and zeroes
remain unchanged.
In order to show the second property of the theorem, we have to deviate first.
36 Signal Processing 1
H(ejΩ ) = a∗ − e−jΩ
and apply the previous example, we obtain G(ejΩ ) = ej2φ(Ω) . Figure 1.5 illustrates the phase
plot for a = 0,5 + 0,3j = Ra + jIa with the phase function φ(Ω) = arc(H(Ω)):
−Ia + sin(Ω) −0,3 + sin(Ω)
φ(Ω) = atan = atan .
Ra − cos(Ω) 0,5 − cos(Ω)
Figure 1.5 clarifies the phase-relation, if we sweep the frequency Ω from −π to π. The
2
2 ( )
( )
0
-2
-4
-6
-8
-10
-12
-14
-4 -3 -2 -1 0 1 2 3 4
angle φ(Ω) decreases (becomes more negative) if we go from Ω = −π to +π. In the figure
we also find the angle ψ(Ω) that monotonically decreases as well. Note that φ(Ω) is related
to ψ(Ω):
1 jΩ
∗ − e
ψ(Ω) = arc a
a − ejΩ
and
1 jΩ
a∗ − e−jΩ ∗ −jΩ a∗ − e
G(ejΩ ) = = −a e .
a − ejΩ jΩ
|a −{ze }
arc(.)=ψ(Ω)
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 37
According to the Greek mathematician Apollonius (Apollonius of Perga [Pergaeus] ca. 262
BC - ca. 190 BC), who found this relationship at conjugate complex numbers (at a time,
when they even have not been invented), the property of decreasing phase when moving along
a circle is called the circle of the Apollonius. See Figure 1.6 for a better understanding.
M: 1/a* M: 1/a*
Ψ(Ω) Ψ(Ω)
N: a N: a
P Ω
Ω=−π C C
P‘
Figure 1.6: Circle of Apollonius: starting at point P at Ω = −π in the left picture the angle
ψ(Ω) becomes more negative when moving to point P’ as shown in the right part of the
figure.
Consider now Figure 1.6. Let the center of the unite circle be denoted by C. Line
PN can be constructed by adding the lines PC = −ejΩ and CN = a in a vector manner:
PN = a − ejΩ , while PM = a1∗ − ejΩ . The angle between lines PN and PM is ψ(Ω).
Why does P lie on the unit circle? For better understanding of this, let us consider the
points P , M and N , such that M forms the point at which 1/a∗ = RRa2+jI a
2 is located in the
a +Ia
complex plane, and N denotes the point at which a = Ra + jIa is located. The point P
marks an arbitrary point on the unit circle. The complex values a and a1∗ have the same
angle in the complex plane.9
9
One can also compute this analytically
by building
the derivative of the phase function. The straight-
∂ sin(Ω)−Ia
forward method is to compute ∂Ω atan Ra −cos(Ω) . One finds then −1 + Ra cos(Ω) − Ia sin(Ω) < 0 as long
as a lies inside the unit circle.
38 Signal Processing 1
The conclusion of Apollonius is that all of these points P are located on the unit circle.
Thus, we obtain
(Rp − Ra )2 + (Ip − Ia )2
k2 = .
(Rp − R2R+I
a 2 Ia
2 ) + (Ip − R2 +I 2 )
2
a a a a
With the approach Rp = R cos(α); Ip = R sin(α) the equivalent description in R and α can
be found:
R2 + R2 + I 2 − 2R(Ra cos(α) + Ia sin(α))
k 2 = (Ra2 + Ia2 ) 2 2 a 2 a .
R (Ra + Ia ) + 1 − 2R(Ra cos(α) + Ia sin(α))
This equation is only solvable for R = 1, i.e., on the unit circle.
We have shown here in this example the property of a single stage allpass filter G(ejΩ ).
This property also holds if more than one of those systems are connected in a row as the
phases simply add up.
The polynomial is decomposed into two fragments No (ejΩ ) and Ni (ejΩ ), of which the zeroes
of the corresponding fragments of N (z) = Ni (z)No (z) either all lie inside, Ni (z), or all
outside of the unit circle, No (z). Note that the location of the zeroes is affected by the
complex conjugate operation. All zeroes of No∗ (z) lie inside of the unit circle, whereas all
zeroes of Ni∗ (z) lie outside of it. By cleverly converting,
we detached an all-pass Ha (ejΩ ) and a system Hm (ejΩ ) of which no all-pass can be separated
anymore. All zeroes of the system Hm (ejΩ ) are located inside of the unit circle. It is
called minimum phase system. Figure 1.7 shows the pole-zero plot of the low-pass with
z-transformed transfer function
z − a1∗ z − a1 1 − a1∗ z −1 1 − a1 z −1
H(z) = = (1.24)
z(z − b) (1 − bz −1 )
with the parameters a = 0,5 + 0,3j and b = 0,8. We find the Fourier transform simply by
replacing z = ejΩ :
ejΩ − a1∗ ejΩ − a1
jΩ
H e =
ejΩ (ejΩ − b)
Minimum-phase
Low-pass filter All-pass low-pass
1/a* 1/a*
a a
a* a*
1/a 1/a
Figure 1.7: Conversion of a low-pass filter into an all-pass and a minimum phase low-pass
filter. Crosses denote poles and circles denote zeros.
1 1.5
0 1
-1 0.5
-2 0
'( )
( )
-3 -0.5
-4 -1
-5 -1.5
H(e j ) H(e j )
Hm(e j ) Hm(e j )
-6 -2
-7 -2.5
-4 -2 0 2 4 -4 -2 0 2 4
Figure 1.8: Phase plot of a linear transmission system H(e−jΩ ) from Example 1.16 (low-
pass H(z) = a∗ − z −1 and the corresponding minimum phase system Hm (z) = 1 − az −1 ).
Left: phase, right: derivative of phase.
right part of the picture, it can be observed that the φ0m (Ω) ≥ φ0 (Ω), thus it is smaller (more
positive means also closer to zero)!
Theorem 1.6 Given two causal LTI systems with same amplitude function but different
phase function, a minimum phase system has the property φ0 (Ω) < φ0m (Ω).
Proof: Consider a second rational transmission system H(ejΩ ) with φ(Ω) under the con-
dition:
|H(ejΩ )| = |Hm (ejΩ )|.
Then, furthermore, it must hold that
representing an all-pass. For such, however, we have just learned that its phase must be
φ0a (Ω) < 0 as it is an all-pass. Likewise, we can see that
φa (Ω) = φ(Ω) − φm (Ω),
or, equivalently
φ0a (Ω) = φ0 (Ω) − φ0m (Ω) < 0,
and the statement of the theorem follows.
Recall that the bi-linear transforms uniquely maps zeroes and poles from the left half
plane into the unit circle. Therefore, we can conclude similar results for the minimum
phase property of time-continuous systems. Rather than sorting the zeroes into two sets,
one inside and one outside teh unit circle, for time-continuous systems we have two sets,
one with zeroes in the left and one with zeroes in the right half plane.
This can be shown by a Fourier-series expansion. The functions δ(ω −2nπ/T ) are periodical
in 2π/T and can thus be expanded into a Fourier-series of the form:
∞
X ∞
X
δ(ω − 2nπ/T ) = ck exp(jkωT ). (1.25)
n=−∞ k=−∞
Proof:
Consider an equidistant sampling with interval T . According to the theorem, there must
exist an interpolation function p(t) with which f (t) can be obtained from the series fk .
∞
X
fˆ(t) = f (kT )T p(t − kT )
k=−∞
∞
X
= T fk p(t − kT ),
k=−∞
where we introduced the reconstructed signal fˆ(t). We like to invest under which conditions
fˆ(t) = f (t), or equivalently, F̂ (jω) = F (jω). The above equation can be interpreted as
a clever approach in which the original question is now reduced to the query how the
interpolator p(t) has to be formed in order to obtain fˆ(t) = f (t). To answer this, we
consider the Fourier-transform F (jω) of the continuous function fˆ(t):
∞
X
F̂ (jω) = T fk P (jω)e−jkωT
k=−∞
∞
X
= P (jω) T fk e−jkωT
k=−∞
= P (jω)FA (jω).
Hence:
This equation adopts a practical form if F (jω) is band-limited with |ω| < π/T . In that case,
F (jω) = F̂ (jω) for |ω| < π/T and P (jω) = 1 in this region. Thus, for the interpolator, the
following must hold:
1 ; |ω| < π/T
P (jω) = .
0 ; else
However, this is nothing more than an ordinary ideal low-pass. Hence:
sin( πt
T
) 1 πt
P (jω) ⇔ p(t) = = sinc .
πt T T
X∞ π
f (t) = fk sinc (t − kT ) . (1.27)
k=−∞
T
However, this equation allows for other solutions as well. Consider a band-limited function
F (jω) in |ω+k2π/T | < π/T and a band-pass P (jω) in that domain. That is also a solution.
Figure 1.9 clarifies the situation.
Note that the sampling theorem provides a sufficient condition not a necessary one.
This explains why literally hundreds of variants exist. We will re-interpret our result in
Equation (1.27) in terms of metrics, see Exercise 2.11 and approximations, see Exercise 3.22.
Note also that the interpolation equation (1.27) allows a so-called ”re-sampling”:
∞
X π
f (mT1 ) = fk sinc (mT1 − kT ) .
k=−∞
T
Consequently, series that have been sampled with interval T , can be converted to series
with sampling interval T1 . However, note that in doing so, the sampling theorem may not
be violated.
44 Signal Processing 1
F(j!)
!
4¼ 2¼ 2¼ 4¼
T T T T
F(j!)
A
!
4¼ 2¼ 2¼ 4¼
T T T T
How has the interpolation function now to be constructed so that f (t) = fˆ(t)?
We examine again the Fourier-transform from which we know that it has a periodical
spectrum:
∞
X 2kπ
F̂ (jω) = P (jω) G j ω+
k=−∞
T
∞
X 2kπ 2kπ
= P (jω) F j ω+ H j ω+ .
k=−∞
T T
This equation can be solved by choosing a band-limited version of 1/H(jω) for P (jω) to
compensate for the effect of the prefiltering in H(jω):
Z π/T
1/H(jω) ; |ω| < Tπ
T exp(jωt)
p(t) = dω ⇔ .
2π −π/T H(jω) 0 ; else
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 45
This idea can be generalized, see Figure 1.10. For each sampled signal and its spectrum,
f(t)
The question, which arises therewith, is the following: is it possible to reconstruct the
function f (t) by the values gk (nT ); k = 1,2,...,m, sampled in the interval T (thus m samples
per period T ) with a linear interpolation?
∞ X
X m
f (t) = gk (nT )pk (t − nT ). (1.28)
n=−∞ k=1
0 T T
m
Figure 1.11: Sampling with non equidistant but periodical sampling with period T .
46 Signal Processing 1
Example 1.20 Let Hk (jω) k = 1,2,..m be arbitrary linear systems. Is it possible to derive
a conclusion concerning these transfer functions, so that the interpolation equation (1.28)
is realizable?
The solution for this problem was found by Papoulis [2], and can be summarized in the
following theorem:
is solvable for all ω (determinant unequal zero). The interpolators are then given by
Z −m 2π + 2π
T T T
pk (t) = Yk (ω,t)ejωt dω. (1.30)
2π −m 2π
T
Note that the terms Yk (ω,t) do not describe the Fourier-transform of pk (t).
Proof:
Let us first consider the response onto f (t) which occurs at the output of Hk (jω): gk (t),
sampled at the time instant nT , thus gk (t − nT ). This is equal to an excitation f (t − nT )
of the system Hk (jω), and also identical to the response of the system Hk (jω) exp(jnωT )
excited by f (t). So, if
∞ X
X m
f (t) = gk (nT )pk (t − nT )
n=−∞ k=1
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 47
should hold, then for the special excitation f (τ ) = exp(jω0 (t+τ )) and its Fourier-transform
F (jω) = exp(jω0 t)δ(ω − ω0 ), the following must hold:
∞
m X
X
exp(jω0 (t + τ )) = gk (nT )pk (τ − nT )
k=1 n=−∞
m X ∞ Z ∞
X 1 0
= Gk (jω 0 )ejnω T dω 0 pk (τ − nT )
k=1 n=−∞
2π −∞
m X ∞ Z ∞
X 1 0
= Hk (jω 0 )F (jω 0 )ejnω T dω 0 pk (τ − nT )
k=1 n=−∞
2π −∞
m X ∞ Z ∞
X 1 0
= Hk (jω 0 )ejω0 t δ(ω 0 − ω0 )ejnω T dω 0 pk (τ − nT )
k=1 n=−∞
2π −∞
∞
m X
X
= Hk (jω0 )ejω0 t ejnω0 T pk (τ − nT ).
k=1 n=−∞
m
X
Hk (jω)Yk (jω,τ ) = 1.
k=1
Further m − 1 linear
equations can be obtained by determining the Fourier-series (1.33)
in other intervals −(m − l) 2π
T
, − (m − l) 2π
T
+ 2π
T
;l = 1,2,...,m − 1, so that we obtain the
following system of equations:
H1 (jω) H2 (jω) ... Hm (jω)
H1 j ω + 2π
H 2π Y1 (jω,t)
2 j ω+
T T
Y2 (jω,t)
.
H1 j ω+2 2π
..
T
..
.
.. ..
. .
Ym (jω,t)
H1 j ω+(m−1) 2π ... Hm j ω+(m−1) 2π
T T
1
exp j2πt
T
j2πt
exp 2
= T .
..
.
exp (m−1) j2πt
T
Appendix
As the derivation of the auxiliary functions in (1.26) is a bit lengthy, we moved it to this
appendix. Consider the Fourier pair:
∞
X
fA (t) = T fk δ(t − kT )
k=−∞
∞
X 2kπ
FA (jω) = F j ω+ .
k=−∞
T
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 49
The identity:
∞
X
FA (jω) = T fk exp(−jkωT )
k=−∞
∞
X 2nπ
= F j ω+ ,
n=−∞
T
the Fourier back transformed of F (jω 0 ) at time instant t = kT . With that we find:
∞
X
FA (jω) = T fk exp(−jkωT )
k=−∞
∞ Z ∞
X T
= F (jω 0 ) exp(jkω 0 T )dω 0 exp(−jkωT )
k=−∞
2π −∞
Z ∞ ∞
X T
= F (jω 0 ) exp(jk(ω 0 − ω)T )dω 0
−∞ k=−∞
2π
Z ∞ X∞
0
= F (jω ) δ(ω 0 − ω − 2nπ/T )dω 0
−∞ n=−∞
∞
X 2nπ
= F j ω+ .
n=−∞
T
1.4 Exercises
Exercise 1.1 Linearity can be defined by the following two constraints:
S[αx] = αS[x]
S[x1 + x2 ] = S[x1 ] + S[x2 ].
Show that the superposition form can be deduced from those and vice versa.
(i) (i)
Exercise 1.3 Observe the theorem of Bezout (1.7) for the case Hi (q −1 ) = h0 + h1 q −1 ;
i = 1,..,nI and Gi (q −1 ) for arbitrary impulse response lengths nG . How many unknowns
and how many equations do you find, dependent on nI and nG ?
Exercise 1.5 It is well known that linear time-invariant systems are commutative. Derive
the system’s transfer function by the polynomial approach, a z-transform and the state-space
approach for two concatenated linear time-invariant systems. Which description is easier?
Exercise 1.6 Derive the Hilbert-transform (relation between real- and imaginary part of
causal functions) for time-discrete series.
Exercise 1.7 Let H(jω) = a + jb; a,b ∈ R. Under which constraints is H(jω) causal?
What changes, if a time-discrete system is given, i.e., H(ejΩ ) = a + jb; a,b ∈ R?
Exercise 1.8 Symmetrical filter show a linear phase. Is it possible to conclude that linear
filters are symmetrical? Which sorts of symmetries do you know? Which forms of linear
phase filters can be deduced from those?
Exercise 1.9 We examine the transfer function of a square-root raised cosine filter, as it
is typically used for pulse shaping in UMTS and WiMAX.
, |ω| ≤ (1 − α) TπS
1
r h i
1 TS π
H(jω) = 2
1 − sin 2α
(|ω| − TS
) , (1 − α) TπS ≤ |ω| ≤ (1 + α) TπS (1.34)
0 , |ω| ≥ (1 + α) TπS .
Let the roll-off factor α be 0,22. What is the corresponding impulse response? Is it a causal
filter? Does the filter show a linear phase?
Exercise 1.11 Have a look at the derivation of the minimum phase system. Is it possible
to create a maximum phase system for which all zeroes are located outside the unit circle?
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 51
Exercise 1.13 Consider a linear phase filter. Can you design such a filter that is of min-
imum phase?
Exercise 1.14 Consider the following linear phase filter with amplitude function
Example 2.1 In a first example we consider the transmission of binary data as depicted
in Figure 2.1. Here, k bits are combined in a codeword x ∈ C ⊂ IB n = Y , pointing at one
of M possible outcomes. Thus, 2k = M . As the data is corrupted during the transmission
process (i.e., some bits are flipped from zero to one or vice versa), the information needs
to be protected. Such protection is obtained by adding redundancy. Here n − k additional
bits are being added following a particular coding rule (block code). The codewords x thus
contain n > k bits. Nevertheless, there are only M = 2k possible codewords and not 2n !
The received vectors y ∈ IB n are passed to the decoder whose task is to find that codeword
that is closest to the received vector. Figure 2.2 depicts potential received values (red) and
expected codewords (green). How does the decoder decide which received word is close to
which codeword? It has to compare the received vector y to all allowed codewords x and
measure the distance to each of them. The codeword x with the smallest distance is the
most likely transmitted symbol. The distance is here simply the number of different bits,
1
Note that the dimension of a vector does not define the dimension of the vector space in which it lives.
52
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 53
Example 2.2 In our second example, we transfer two possible signal forms from two sen-
sors. Take for example a modern car with wireless pressure sensors in each wheel. To
distinguish the left from the right wheel signals, they transmit their information by different
signatures, say f (t) and g(t). The receiver has to decide whether the received signal form
r(t) is closer to f (t) or to g(t). We thus need a distance measure from r(t) to f (t) and r(t)
to g(t):
s s
Z b Z b
d2 (r(t),f (t)) = |r(t) − f (t)|2 dt; vs. d2 (r(t),g(t)) = |r(t) − g(t)|2 dt.
a a
0) d(x,y) ≥ 0
1) d(x,y) = d(y,x)
2) d(x,y) = 0; iff x = y
(n, k ) − Code
y x
= 2k
# Info bits k = log 2 ( M )
# Code bits n ≥ k
Code Rate:
k log 2 ( M )
R= =
n n
Note that 0) is not really required since it follows from the other three: we have d(x,x) = 0
from 2). Furthermore, 0 = d(x,x) ≤ d(x,y) + d(y,x) = 2d(x,y) where we have used 1) and
3). We can thus follow 1) d(x,y) ≥ 0.
Definition 2.2 (Metric space) A metric space (X,d) is described by a pair, comprising
a set X and a metric d, valid on this set.
Example 2.3 Consider the following metric d1 (,) : IRn × IRn → IR+0 :
n
X
d1 (x,y) = |xi − yi |.
i=1
defined over the n-dimensional vectors x,y. This metric is called l1 metric. It is often used
as it requires very little computational complexity. In literature it is sometimes referred to
as the Manhattan distance as walking along the street grid of Manhattan would provide a
similar distance.
Example 2.4 We can generalize the previous to dp (,) : IRn × IRn → IR+0 :
n
! p1
X
dp (x,y) = |xi − yi |p ,
i=1
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 55
which turns out to be a valid metric for every 1 ≤ p < ∞. Here, of particular interest are
l1 , l2 (Euclidean metric), and d∞ (,) : IRn × IRn → IR+0 :
n
! p1
X
d∞ (x,y) = lim |xi − yi |p = max |xi − yi |.
p→∞ 1≤i≤n
i=1
It is relatively straight-forward to show that the last equivalence holds. Consider the largest
difference ∆ = max |xi − yi | at index I = arg max |xi − yi | and take it out:
n n n p
X X X |xi − yi |
|xi − yi |p = ∆p + |xi − yi |p = ∆p 1 + .
i=1
∆p
i=1,\I i=1,\I
p
Eventually, computing the limit for p → ∞ we find that all terms |xi∆
−yi |
p < 1 are smaller
than one and go to zero with growing p when comparing to the leading one. Thus only ∆
remains, which is the desired result.
where ⊕ refers to the antivalence operation (exclusive or), i.e., (xi ⊕ yi ) = 1 only if xi 6= yi .
Example 2.6 The set IRn (vectors with n entries from IR) together with the metric d2 (x,y)
builds a metric space.
A metric thus allows for statements about sizes (lengths, areas, volumes, etc. for p = 2)
of objects. Tied to this is the existence of such objects, or, equivalently whether they fit
into the given space. Given an object x of the space, we thus need to test if it fits into the
space, i.e., if
dp (x,0) = dp (x) < ∞.
56 Signal Processing 1
For lp -metrics, we can also leave out the p-th root as it is a monotone mapping, i.e., it is
sufficient to show if n
X
|xi |p < ∞.
i=1
Example 2.7 Consider the causal, infinitely long series of real or complex-valued numbers
xi , i = 0,1,...,∞, with the property
∞
X
|xi |p < ∞.
i=0
For each p for which the condition is satisfied, the sequence belongs to the corresponding
metric space lp (0,∞) with metric dp . (Ger.: Folgenraum).
Consider on the other hand the sequence xi , i = −∞,..., − 1, 0,1,...,∞ with the same prop-
erties and same metric, we have the lp (−∞,∞) space. Consider the metric for p → ∞, for
sequences for which: |xi | is bounded for every i, i.e., |xi | < M , we obtain the l∞ (0,∞) or
l∞ (−∞,∞) space, respectively:
z ≥ xi ; for all i
If there is no number in IR that is larger than the largest element in S, then we have
sup(S) = ∞.
z ≤ xi ; for all i
If there is no number in IR that is smaller than the smallest element in S, then we have
inf(S) = −∞.
Example 2.8 Let S = (3,6) be an open set, then: inf(S) = 3, sup(S) = 6. Let T = [4,8),
then: inf(T ) = 4, sup(T ) = 8. Let U = [2,∞), then: inf(U ) = 2, sup(U ) = ∞.
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 57
Example 2.9 Metric spaces can also be formed with particular properties. Let lh (0,∞) be
a metric space of sequences in which sequences exist whose quadratic sum is finite (finite
energy sequence). Thus, xi ∈ lh (0,∞) = lp=2 (0,∞) means, that
∞
X
|xi |2 < ∞,
i=0
thus the sequences of bounded energy. Note that Gaussian noise, a dominant noise model
in telecommunications, is not member of this metric space.
Metric spaces can also be defined over functions rather than a set of numbers (note
that functions are in a way sets of numbers). These are called metric spaces over functions
(Ger.: Funktionenraum).
xo (t ) + ε d 2 ( xo , xm ) < ε
d ∞ ( xo , xm ) < ε
3
xo (t ), xm (t )
2
xo (t ) − ε
x(t)
0
0 1 2 3 4 5 6 7 8 9 10
4
d 2 ( xo , x1 ) < ε
d 2 ( xo , x2 ) < ε
3
x1 (t ) d ∞ ( xo , x1 ) >> ε
d ∞ ( xo , x2 ) >> ε
2 xo (t )
x(t)
0
x2 ( t )
0 1 2 3 4 5 6 7 8 9 10
The metric over the functions x(t) and y(t) from X is then3 :
Z b 1/p
p
dp (x,y) = |x(t) − y(t)| dt ;1 ≤ p < ∞
a
The space with the metric dp over the functions is called Lp space. For p → ∞ we have for
bounded functions (from above and below) :
In Figure 2.3 we see some examples of metrics. In the upper graph we find two functions
x0 (t) and xm (t). Both functions are relatively close to each other. This is well expressed by
the metrics. No matter if a d2 or d∞ metric is selected, for both a small value comes out.
The d∞ metric can be found by moving x0 (t) up and down until they are the tightest upper
and lower bound to xm (t). In the lower picture the same function x0 (t) is displayed. But
now two other functions x1 (t) and x2 (t) are shown. They are almost identical to x0 (t) but
at one point they differ. Computing the d2 metric, however, does not show any discrepancy
as integrating over a point does not deliver any contribution. Taking the d∞ metric here,
makes a substantial difference, as now the outstanding point defines the value for the metric.
If we thus are interested in measuring outliers, a d∞ metric is the right choice.
2.1.1 Sparsity
In Figure 2.4 we consider a 2-dimensional vector whose dp -metric is constant. Constant
metrics are called iso-metrics (from the Greek word isos=equal)4 . For p = 2 we find a circle
(here only the right quadrant is shown). For larger values of p we see the metric inflate
that is it becomes more blown up and eventually for p → ∞ a square shape is obtained.
But what happens for values of p smaller than two? For p = 1 we find a straight line.
Decreasing p further makes the metric deflating.
What happens if we decrease p below 1? The dp metric definition does not hold any
longer as the next example shows.
1
Example 2.10 Consider p = 2
and select the three points x = (1,0),y = (0,1),z = (0.5,0.3).
Now let us compute d1/2 :
2
!2 2
!2 2
!2
1 1 1
X X X
|xi − yi | 2 =4> |xi − zi | 2 + |zi − yi | 2 = 3.96.
i=1 i=1 i=1
3
In terms of a Lebesgue Integral.
4
You may remember iso-bar lines that show lines of equal air pressure.
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 59
inflation
p=100
p=4
p=2
x
x = 1
x2
p=1
d p (x ) = 1
p p
p=1/2 = x1 + x2
p=1/4
deflation
Figure 2.4: Varying p: inflating and deflating metrics.
which is truly a counter for sparseness. But note, mathematically speaking, this is not
a metric (norm); sometimes it is called pseudonorm. Mathematical difficulties with this
definition occur due to discontinuities as
0 0 ;x = 0
x = .
1 ; else
60 Signal Processing 1
Example 2.11 Consider the following application of metrics and sparseness. Given an
interpolator p(t) = sinc( Tπ t), we want to sample a continuous function f (t) such that the
corresponding sequence gk minimizes the quadratic distance to the original function, i.e.,
we want
1
2 2
∞
! ∞
X Z ∞ X π
min d2 f (t), gk p(t − kT ) = min f (t) − gk sinc (t − kT ) dt
gk
k=−∞
gk −∞
k=−∞
T
We can now variate the problem by providing a set of possible interpolation functions,
say {p1 (t),p2 (t),....,pM (t)}. We now like to find a sparse representation of gk :
k0
X
min d0 (gk |{pm (t)}); such that f (t) = gk pm (t − kT ).
gk
k=0
Alternatively we can variate the problem in providing a set of possible sampling sequences
{g1,k ,g2,k ,...,gm,k }. If the interpolators are given in terms of a digital filter followed by a fixed
interpolator, i.e.
P
X −1 π
p(t) = pk sinc (t − kT ) .
k=0
T
A part of the interpolator is thus given in terms of the P coefficients p0 ,p1 ,...,pP −1 .
We can now formulate the sampling problem as
k0
!
X
min d2 f (t), gm,k p(t − kT ) .
pk
k=0
The sparsity is thus provided by the set of sparse sampling sequences {g1,k ,g2,k ,...,gm,k } and
does not need to be included in the optimization process. The desired level of sparsity (how
many entries in gk are non-zero) is already introduced in the initial design. We thus have
to solve now
k0
!
X
min min d2 f (t + tD ), gm,k p(t − kT ) =
gm,k pl
k=0
2 12
Z (P +ko)T k 0 P −1
X X π
= min min f (t + t ) − g p sinc (t − lT − kT ) dt .
D m,k l
T
gm,k pl 0
k=0 l=0
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 61
In terms of gm,k it is a trial-and-error process. The size of the set defines the complexity
of the search algorithm. In terms of the coefficients pk , however, it is a d2 −metric that we
will learn later to solve by so called Least-Squares methods. This method is the principle
of the so-called codebook excited linear predictive coding (CELP) that is nowadays a
common standard for speech coding and can be found in every speech-codec (handy, digital
phone). The interpolator is the linear prediction mechanism (indicated by the future value
f (t + tD ); tD > 0; typically tD = T ), the pre-stored set of excitations gm, is the codebook
from which we select.
Let us summarize the various metric spaces in Table 2.1 as they are commonly used in
the literature.
Example 2.12 Let us apply the metric for a filter design: a linear time-discrete filter with
linear phase
H(ejΩ ) = a0 + a1 ejΩ + a1 e−jΩ = a0 + 2a1 cos(Ω)
is to design, so that it follows a desired amplitude function
jΩ 1 ; f or|Ω| < ΩG
|Hd (e )| =
0 ; else
Note that all terms in the desired values a0 ,a1 are quadratic. We can thus solve the problem
by deriving with respect to a0 and a1 individually. Setting the resulting equations to zero
delivers the desired results. We find
ΩG
a0 = ,
π
sin(ΩG )
a1 = .
π
Compare the result with the coefficients of the Fourier series. The coefficients are the two
first coefficients of a Fourier series. The result is depicted in Figure 2.6 further ahead (left
upper part). Not surprisingly the obtained filter solution is only weakly resembling the low
pass character. Too few coefficients were spent.
Example 2.13 Let us return the problem of Paley-Wiener. Given an amplitude function
Ad (Ω), what is a valid filter F (ejΩ ) that satisfies this? We cannot use the technique of the
previous example as it requires the knowledge of the desired filter function in amplitude and
phase. However, if we consider linear phase filters as solution, we remember that for those
we can write
F (ejΩ ) = AF (Ω)e−jΩ∆
with ∆ a positive constant, indicating the linear phase. We know recall that linear phase
filters have symmetric impulse responses, for example
N
!
X
F (ejΩ ) = e−jΩN f0 + fn (e−jnΩ + ejnΩ )
n=1
N
!
X
= e−jΩN f0 + fn 2 cos(nΩ)
n=1
N
X
= e−jΩN hn cos(nΩ),
n=0
where we took the linear phase part in front, thus ∆ = N . In the last line line we rewrote
the terms by introducing new coefficients hn = 2fn for n = 1,2,...,N and h0 = f0 . Now we
are ready to compute the coefficients hn based on a d2 metric:
Z π 2
N
1 X
min Ad (Ω) − hn cos(nΩ) dΩ.
H 2π −π
n=0
As in the previous example we compute the derivatives taking advantage of the following
property Z π 1
1 2
;l = n
cos(lΩ) cos(nΩ)dΩ =
2π −π 0 ; else
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 63
Thus given the amplitude function Ad (Ω), we now know how to compute a corresponding
filter function. This is of course not the only solution. Any additional allpass can variate
the phase without changing the amplitude function. By this also minimum phase solutions
can be found, employing the techniques of the previous chapter.
We only add rational numbers but once k → ∞ the final result is π 2 /6 which is clearly not
from Q
I . We thus like to know more about where such infinite series are ending.
Definition 2.8 (Limit) If there exists to every distance d a number no , such that
d(xn ,y) < d for each n > no at fixed value y, the sequence xn is said to be convergent
to y.
xn → y
y = lim xn .
n→∞
y is called the limit (Ger.: Grenzwert) of xn . All points in an arbitrarily small distance
to y are called the -neighborhood (Ger.: Nachbarschaft) of y.
an = n2 ,
bn = 1 + (−1)n .
lim sup xn .
n→∞
64 Signal Processing 1
lim inf xn .
n→∞
A sequence converges if
lim sup xn = lim inf xn .
n→∞ n→∞
lim sup cn = 4,
n→∞
lim inf = 0.
n→∞
sup cn = 4.5
delivers a different result. There are two limit points. The subsequence {c0 ,c2 ,c4 ,..} takes
on the limit 4, while the subsequence {c1 ,c3 ,c5 ,..} takes on the limit 0.
The previous definition of a limit is not of practical nature. Once we know the limit
point, we can test if the series converges towards it, but if we do not have clue whether it
converges and whereto, we cannot say. A more practical definition is thus the following one
in which we do not need to know the limit a-priori.
Example 2.16 Let X = C[−1,1] be a set of continuous functions and fn (t) a sequence of
functions, defined by
0 ; t < − n1
nt
fn (t) = + 21 ; − n1 ≤ t ≤ n1
2
1 ; t > n1
Consider the metric space (X,d2 ) with
sZ
1
d2 (f,g) = (f (t) − g(t))2 dt
−1
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 65
f1 ( t ) f n (t )
−1/ n 1/ n t t
Example 2.18 A further good example of Chauchy sequences is obtained when calculating
roots by the so called Heron’s method. We like to solve the problem
f (x) = x2 − A = 0
5
Josiah Willard Gibbs (11.2.1839 -28.4.1903) was an American scientist in Physics, Chemistry, Mathe-
matics.
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 67
1.2
0.8
lim n→∞ d 2 ( f (t ), f n (t )) = 0
0.6
0.4
-0.2
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5
f (xn )
xn+1 = xn − .
f 0 (xn )
a Ab a2 + Ab2
x1 = + = ,
2b 2a 2ab
which is again√in Q I with every iteration. Nevertheless, for n → ∞ the
I . We thus stay in Q
final value is A. Thus even if A = 3 ∈ N , the final value is in IR.
2.1.4 Robustness
Often it is important not to optimize the mean value, e.g., mean data rate, Bit Error Ratio
(BER) but the value at worst case condition. If a system shows to be insensitive to the
worst cases, it is called robust. Robustness is often measured in terms of energy. Let xi be
the input sequence and yi the output sequence of a system T . The output is assumed to
be distorted by an unknown sequence vi . The form of distortion is not necessarily additive
or even known. Furthermore, the M initial states zi of this system are also not known.
68 Signal Processing 1
Assume that there exists a reference system without distortion having an output signal
(R)
yi . The influence of such distortion can be described by the following expression:
PN (R)
i=0 (yi − yi )2
PM 2 PN 2 ≤ γ 2.
i=1 zi + i=0 vi
We express the fact that this ratio stays below a positive upper bound by selecting a value
γ 2 . As this is true for every possible noise and initial states, we have to consider the worst
case:
PN (R)
i=0 (yi − yi )2
sup PM 2 PN 2 ≤ γ 2 .
zi ∈lh (1,M ),vi ∈lh (0,N ) i=1 zi + i=0 vi
Assume now that there exists several possibilities for the realization of this system T (F )
depending on a specific strategy. Then the robustness criterion reads:
PN (R)
i=0 (yi − yi )2
inf sup PM 2 PN 2 ,
i=1 zi + i=0 vi
F z ∈l (1,M ),v ∈l (0,N )
i h i h
saying that we are interested to find that strategy F that minimizes this worst case upper
bound.
Example 2.19 A power amplifier (PA) as depicted in Figure 2.8 (left) in a cell phone
(Ger.: Handy) can be described by the following nonlinear mapping:
ρi |xi |2
yi = xi + vi .
1 + ρi |xi |2
The variable ρ = ρi (T ) describes the influence of temperature, aging and more. In order
to define the robustness of the output signal with respect to ρ, the following expression is
considered:
PN (R)
(yi − yi )2
sup PNi=0 2 PN 2 ≤ γ 2 .
ρi ,vi ∈lh (0,N ) i=0 ρi + i=0 vi
If it is possible to reduce the influence of ρ by new technologies (e.g. a feedback with con-
troller as depicted in Figure 2.8 (right)) or improved circuitry, a smaller factor γ is the
result:
PN (R)
i=0 (yi − yi )2 2
sup PN 2 PN 2 ≤ γnew < γ 2.
ρi ,vi ∈lh (0,N ) i=0 ρi,new + i=0 vi
T PA yi T +
PA yi
Contr.
Definition 2.12 A binary operation ∗ on a set S is a rule that assigns to each ordered pair
(a,b) of elements from S some element from S. Since the operation ends with an element
in the same set, we call this also a closed operation.
a ∗ b = min(a,b).
Definition 2.13 (Linear Vector Space) A linear vector space S over a set of scalars
T (C
l ,IR,Q,ZZ,NN ,B) is a set of (objects) vectors together with an additive + and a scalar
multiplicative · operation, satisfying the following properties:
3. W.r.t. the multiplicative operation there exists an Identity (One) and a Zero element:
1x = x, 0x = 0.
Definition 2.14 (Group) A set S for which a binary operation ∗ (operation w.r.t two
elements of S) is defined, is called a group if the following holds:
3. for each element a in S there exists an inverse element b in S, so that: a∗b = b∗a = e.
Note that in the previous definitions the binary operations +, ∗ ,· were placeholders
even though they may associate classic addition or multiplication. For each number field a
specific operation need to be tested.
Example 2.21 Let S be the set of vectors of a particular dimension, say n. S forms a
group w.r.t the additive operator, if the following properties are satisfied:
3. For each element x ∈ S there exists a second inverse element y ∈ S, so that x+y = 0,
equivalently y = −x.
Definition 2.15 (Ring) A set S for which two binary operations + and * are defined, is
called a ring if the following holds:
Definition 2.16 (Field) A set S equipped with two binary operations + and * is called a
field (Ger.: Körper), if:
If the set is finite, i.e., |S| < ∞, we talk about finite groups, finite rings, finite fields
and so on.
Example 2.22 The Galois Filed GF(2) is a field. It comprises of the elements 0 and 1.
Let us define the two binary operations + and ∗ as follows:
a b a+b a b a*b
0 0 0 0 0 0
0 1 1 0 1 0
1 0 1 1 0 0
1 1 0 1 1 1
W.r.t. the operation +, S needs to be an Abelian Group:
2. Identity element is 0 : 0 + 0 = 0, 1 + 0 = 1.
4. Associativity: (a + b) + c = a + (b + c).
Example 2.24 Examples for finite dimensional linear vector spaces are:
1. Consider the linear vector space in IR4 (set of quadruples, Ger.: Menge der Quadrupel)
1 5 6 13
5 2 7 19
x= 4 ; y = 0 ; x + y = 4 ; 3x + 2y = 12 .
2 −2 0 2
4. Consider the (3,2) single parity check code in GF(2)3 with the elements: V =
{[000],[011],[101],[110]}. We define the operation + as a binary exclusive or oper-
ation. This Galois Field is a linear vector space.
Example 2.26 Let S be the set of polynomials of arbitrary degree (e.g., > 6) and V the
set of polynomials of degree less than 6. Then V is a subspace of S.
Example 2.27 Consider a (n,k) binary linear block code in which k arbitrary bits are
mapped onto n bits. This is a k-dimensional subspace of GF(2)n .
x = c1 p1 + c2 p2 + ... + cm pm .
Example 2.28 Let S = C l (IR), the set of continuous functions over the complex (real)
numbers. Let, furthermore, p1 (t) = 1; p2 (t) = t, p3 (t) = t2 . A linear combination of such
functions is given by:
x(t) = c1 + c2 t + c3 t2 .
Consider the polynomial x(t) = 6 + 5t + t2 , and the function p4 (t) = t2 − 1. We find:
Obviously, the description is not unique. The number of required coefficients varies.
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 73
Definition 2.19 (Linear Independence) Let S be a linear vector space and T a subset
of S. The subset T (pi ; i = 1,2,...,m) is said to be linearly independent (Ger.: linear
unabhängig), if for each nonempty linear subset T the only finite set of scalars satisfying
the equation
c1 p1 + c2 p2 + ... + cm pm = 0
is the trivial solution c1 = c2 = ... = cm = 0.
If the above equation is satisfied by a set of scalars that are not all equal to zero, then the
subset (pi ; i = 1,2,...,m) is called linearly dependent (Ger.: linear abhängig).
Example 2.29 The previously presented polynomials p1 (t) = 1; p2 (t) = t, p3 (t) =
t2 , p4 (t) = t2 − 1 are linearly dependent, since:
p1 (t) − p3 (t) + p4 (t) = 0.
The polynomials p1 (t),..., p3 (t) are linearly independent!
The vectors p1 = [2, − 3,4],p2 = [−1,6, − 2] and p3 = [1,6,2] are linearly dependent since:
4p1 + 5p2 + 3p3 = 0.
It is obviously not trivial to check if a set of vectors is linear dependent or not.
Example 2.30 Consider the complex numbers z = r + ji in IR2 in vector form:
r r
T = , .
i −i
This is a set describing z and z ∗ . Note that both elements are linearly independent as long
as i is unequal to zero.
Example 2.31 Consider band-limited functions or sequences xk with Fourier transform
X(ejΩ ) (also band-limited random processes) that only exist in a frequency range S:
X(ejΩ ) = 0; for Ω ∈ S.
For a sequence fk (linear time-invariant system) with Fourier transform F (ejΩ ) existing in
the complementary space of S
F (ejΩ ) = 0; for Ω ∈
/ S.
we find:
∞
X
yn = fk xn−k ,
k=−∞
Y (e ) = F (e )X(ejΩ ) = 0,
jΩ jΩ
∞ −1
1 X 1 X
xn = − fk xn−k − fk xn−k .
f0 k=1 f0 k=−∞
74 Signal Processing 1
Band-limited signals are said to be linearly dependent! Note however, that the property
finite number of elements is not necessarily satisfied.
Example 2.32 An open issue is to find the linear weights ci in a linear combination given
the result of such combination.
c1 p1 + c2 p2 + ... + cm pm = x.
The vector set {pi } now forms a matrix P , the coefficients a vector c. Given a right hand
side x, do you know now how to obtain the linear weights ci ?
Answer: we have to invert matrix P and find c = P −1 x.
Let us set x = 0. The a solution exists only if the columns of P are linearly dependent.
Example 2.33 Consider a blind channel estimation scheme as depicted in Figure 2.9.
Such schemes were very much of interest in the 90ies as the desire to get rid of redundant
training signals came up. The question was whether a channel estimation is possible at
all without known data signals. A signal sk is transmitted over two linear time-invariant
sk(1)
rk(1)
h1 g1
sk 0
rk(2) sk(2)
-
h2 g2
Figure 2.9: Blind channel estimation scheme.
channels h1 and h2 . If the two linear time-invariant filters g1 = h2 and g2 = h1 , then the
(1) (2)
convolution results in the same outcome sk = sk and the difference of the two becomes
zero. The question is under which conditions is the requirement that both outcomes are
identical, sufficient to obtain the desired estimate for the two channels. This is a tricky
question that will require several iterations in order to solve it completely. Nevertheless, at
this state we can already formulate the problem and provide first conditions.
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 75
First, we recognize that if (g1 ,g2 ) are solutions, so are (αg1 ,αg2 ), that is any scaled version.
We can thus not expect a unique solution. Second, we recall Bezout’s Theorem 1.1 and
follow that h1 and h2 need to be co-prime. Otherwise a common part could be split and
would alter the input sequence sk . The third condition now comes from a consideration in
terms of linear dependency. At each sensor (antenna) we observe N elements of the past:
(i) (i) (i) (i)
rk = [rk ,rk−1 ,...,rk−N +1 ]T ; i = 1,2.
In order to be identical both convolutions must provide the same result (starting at k = 1):
(1) (1) (2) (2)
g11 r1 + g12 r2 + ... + g1m r(1) (2)
m = g21 r 1 + g22 r 2 + ... + g2m r m .
order two can be identified. We can now answer the following general questions:
How many signals (frequencies Ωo ) of the form exp(jΩo k) are required to identify a
time-invariant system of order M ?
Answer: M different frequencies are required.
How many signals (frequencies Ωo ) of the form sin(Ωo k) are required to identify a
time-invariant system of order M ?
Answer: Only M/2 different frequencies are required.
Definition 2.20 (Span) Let T be a set of vectors in a vector space S over a set of scalars
IR(Cl ,Q,Z
Z,N N ,B). The set of vectors V that can be reached by all possible (finite) linear
combinations of vectors in T is the span (Ger.: aufgespannte Menge, erzeugte Menge,
lineare Hülle) of the vectors:
V = span(T ).
Note that span is a short form to describe something that would be lengthy otherwise.
We map a set T into a typically larger set V . Take for example T1 = {x,v}; then
V1 = span(T1 ) = αx + βy|α,β ∈ K .
The number field K is typically IR or Cl . Thus the set with two elements T1 is mapped
into a set of an infinite amount of members.
The saving of writing is even more pronounced in the next example where T2 = {xi |i =
1,2,...,N }. We find
( N )
X
V2 = span(T2 ) = αi xi |αi ∈ K, xi ∈ T2 .
i=1
Definition 2.21 (Hamel Basis) Let S be a vector space, and let T be a set of vectors
from S such that span(T ) = S. If T is linearly independent, then T is said to be a Hamel6
basis for S.
Example 2.34 The vectors p1 = [1,6,5], p2 = [−2,4,2], p3 = [1,1,0] and p4 = [7,5,2] are
linearly dependent. Note that T = {p1 ,p2 ,p3 } spans the space IR3 and thus is a basis for IR3 .
Example 2.35 The vectors p1 = [1,0,0], p2 = [0,1,0] and p3 = [0,0,1] are linearly inde-
pendent and are a basis for IR3 . This basis is often called natural basis (Ger.: natürliche
Basis).
6
After the mathematician Georg Karl Wilhelm Hamel (12.9.1877-4.10.1954).
78 Signal Processing 1
Example 2.36 Consider the (3,2) single parity check code in GF(2)3 with the elements:
V = {[000],[011],[101],[110]}. A Hamel basis is given by:
[011]
G= .
[101]
Answer: These matrices are not a basis for IR3×3 . However, the are a basis for the subspace
in IR3×3 , that has a zero row and column sum!
Definition 2.22 (Cardinality) The number of elements in a set A is its cardinality |A|
(Ger.: Kardinalität).
Theorem 2.1 If two sets T and U are Hamel bases for the same vector space S, then T
and U are of the same cardinality.
Thus, {q 1 ,p2 ,p3 ,...pm } is also a basis for S. Further substitution leads to:
{q 1 ,q 2 ,p3 ,...pm }
{q 1 ,q 2 ,q 3 ,...pm }
... {q 1 ,q 2 ,q 3 ,...,q m−1 ,pm }
{q 1 ,q 2 ,q 3 ,...,q m−1 ,q m }.
Let’s now revisit Example 2.33. Which necessary conditions do we have so far?
1. We recognize that if (g1 ,g2 ) are solutions, so are (αg1 ,αg2 ), that is any scaled version.
We can thus not expect a unique solution.
2. We recall Bezout’s Theorem 1.1 and follow that h1 and h2 need to be co-prime.
Otherwise a common part could be split and would alter the input sequence sk .
3. The input sequence sk must be persistent exciting.
(i)
4. We can enforce that the vectors rk ; i = 1,2; k = 1,2,...,m are linearly dependent.
As each vector contains N elements and we have 2m such vectors, we simply select
2m > N . As maximally N vectors span the entire space IRN with 2m > N vectors
we ensure that some vectors are linearly dependent.
It turns out that these conditions are indeed sufficient for a solution that is unique up to
the scaling of the impulse responses. The existence and uniqueness of the solution is an
important step for finding the solution. The solution itself will be presented in a later
chapter.
Example 2.39 Consider the space of continuous functions C[a,b] with the two elements
x(t) and h(t). Let x(t) be an input signal and h(t) the impulse response of a low pass. We
have:
Z T
y(T ) = x(τ )h(T − τ )dτ
0
Z T
= x(τ )g(τ )dτ = hx,gi .
0
l : x,y = y H x. But also matrices
Example 2.40 A simple example are the vectors in C
can build inner products by the trace operator:
tr(B H A) = hA,Bi .
This is also an inner vector product, however with an additional weighting function f (x,y).
To illustrate the inner product of two vectors, let us consider two 2-dimensional vectors
x and y. Their inner product (projection) is a measure of non-orthogonality as depicted in
Figure 2.10. If the two vectors are perpendicular (orthogonal) to each other, the inner prod-
uct is zero; if they are parallel (antiparallel), the inner product is a maximum (minimum).
Definition 2.25 (Banach and Hilbert Space) A complete normed vector space is said
to be a Banach space. Is there additionally an inner vector product (the norm is an induced
norm) the space is said to be a Hilbert space.
Example 2.43 The space of continuous functions (C[a,b],dp ) is for finite p not a Banach
space since it is not complete.
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 81
x x x
y y y
x, y > 0 x, y = 0 x, y < 0
Figure 2.10: The inner product of two 2-dimensional vectors.
Example 2.44 The space of sequences lp (0,∞) is a Banach space. For p = 2 it is also a
Hilbert space.
Example 2.45 The space of functions Lp [a,b] is a Banach space. For p = 2 it is also a
Hilbert space. This Hilbert space is often denoted as L2 (IR) or L2 (IR) for functions and
l2 (IR) or l2 (IR) for sequences.
In the following part of the lecture we will exclusively consider Hilbert spaces. (if not
noticed otherwise). Figure 2.11 illustrates several metric vector spaces.
l0
How do we have to modify the set, in order to make the vectors orthonormal?
Answer: we multiply all vectors by 21 .
Let T = {p1 ,p2 ,...,pn } be a set of vectors. How can we find a set S = {q 1 ,q 2 ,...,qm } with
m smaller or equal to n so that span(S)=span(T ) and the vectors in S are orthonormal?
The answer to this problem is known as the Gram-Schmidt7 method:
rD E
1. take p1 and construct: q 1 = p1 / p1 ,p1 .
D E p
2. build: e1 = p2 − p2 ,q 1 q 1 ; q 2 = e1 / he1 ,e1 i.
D E D E p
3. continue: e2 = p3 − p3 ,q 1 q 1 − p3 ,q 2 q 2 ; q 3 = e2 / he2 ,e2 i...
Orthogonal (and orthonormal) bases offer some advantages when calculating with them.
Consider a vector f = [a,b,c]T and let P = {p1 ,p2 ,p3 } be an orthonormal basis. We can
display the vector as
D E D E D E
f = f ,p1 p1 + f ,p2 p2 + f ,p3 p3
D E
f ,p
D 1E
= [p1 ,p2 ,p3 ] f ,p2 .
D E
f ,p3
The proof of this statement is easily done by left multiplication of [p1 ,p2 ,p3 ]T . The or-
thonormal basis thus works as a filter to separate the individual components of f .
7
Erhard Schmidt 13.1.1876- 6.12.1959 was a German Mathematician.
84 Signal Processing 1
Definition 2.30 (Biorthonormal) If there are two bases, T = {p1 ,p2 ,p3 ...,pm } and U =
{q 1 ,q 2 ,q 3 ...,q m } that span the same space with the additional property:
D E
pi ,q j = kij δi−j
then these bases are said to be dual or biorthogonal (biorthonormal for kij = 1).
f = [a,b]T
then
D E D E
f = f ,q 1 p1 + f ,q 2 p2 = (a + b)p1 + bp2
D E D E
= f ,p1 q 1 + f ,p2 q 2 = aq 1 + (−1 + b)q 2 .
The biorthogonal basis allows a fast reinterpretation from one basis to the other. This
concept has been used widely in the context of wavelets in the 90ies and has become important
modern filter bank designs for wireless transmissions, so-called Filter Bank Multi-Carrier
Techniques (FBMC).
Definition 2.31 (Norm) Let S be a vector space with elements x. A real-valued function
kxk is said to be a norm (length) of x, if the following four properties are satisfied:
0) kxk ≥ 0 for every x ∈ S.
1) kxk = 0 iff x = 0.
2) kαxk = |α|kxk.
3) kx + yk ≤ kxk + kyk (triangle inequality).
Note that 0) follows from 1) and 3)! Norms are a special form of metrics. They are tailored
to be suitable for the linear vector space. In consquence, metrics are not necessarily norms
(see, e.g., the Hamming distance metric).
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 85
Example 2.49 Some metrics can directly be used to define a norm. Take for example the
dp (·) metric. We find
We thus find
n
X
l1 − norm: kxk1 = |xi |,
i=1
v
u n
uX
l2 − norm: kxk2 =t |xi |2 ,
i=1
Note that although these are commonly defined norms, adding a multiplicative constant does
not change the norm property. Thus, a normalization can also be found, e.g.,
s
Z b
1
L2 − norm: kx(t)k2 = |x(t)|2 dt.
b−a a
Definition 2.32 (Normed Space) A normed, linear space is a pair (S,k · k) in which S
is a vector space and k · k is a norm on S. Metrics of a normed linear space are defined by
norms.
Note that the elements (vectors) of a linear and normed space are not necessarily
normalized vectors.
Only the space is normed, meaning that a norm exists for this space.
An important theorem in the context of norms is the Norm Equivalence Theorem that
follows next. Loosely speaking it claims that the knowledge of one norm is sufficient to
86 Signal Processing 1
conclude to all other norm. More precisely, it shows that convergence in one norm, guar-
antees convergence in any norm. Let us assume that a limit (vector) y ∗ exists. We consider
a sequence of vectors y k with time index k. The convergence of this series to y ∗ can be
quantified by a norm: limk→∞ ky k − y ∗ k = 0. We can also replace xk = y k − y ∗ and consider
sequences of vectors that converge to the zero vector.
Theorem 2.2 (Norm Equivalence) Let k·k and k·k0 be two norms on finite dimensional
l n ,Qn ,Z
spaces IRn (C Z n ,Nn ,Bn ), then we have:
lim kxk k = 0
k→∞
iff
lim kxk k0 = 0.
k→∞
Proof:
Without restricting generality we set (Ger.: oBdA) k · k0 = k · k2 and obtain for the lower
bound:
n
X
x = xi e i
i=1
n n n
X X X 1
kxk ≤ |xi |kei k ≤ max{|xi |} kei k ≤ kxk2 kei k = kxk2
i=1
i
i=1 i=1
α
αkxk ≤ kxk2 .
We have used the fact that the l2 norm is an upper bound for the l∞ -norm. This is easily
shown by8
s
√
s
X X |xi |2
kxk2 = |xi |2 = xmax = kxk ∞ 1 + ... ≥ kxk∞ .
i i
|xmax |2
kxk2 = c > 0.
And we obtain:
x
1
kxk =
kxk2
kxk2 ≥ β kxk2 ; β > 0.
With the above condition, x cannot be the zero vector and since k · k > 0 there must be a
positive lower bound β with c = kxk2 ≤ βkxk. Since the property is true for the l2 norm,
8
With this method, it can be shown that kxkp ≥ kxk∞ .
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 87
Note that the equivalence theorem is defined in terms of Cauchy sequences. This is a
consequence from the proof. Since if a particular norm tends to zero, then it does for every
norm. The equivalence is to be understood in this sense.
Norms often find their application in terms of energy relations. It can be in form of
average energy (l2 -norm) or peak values (l∞ -norm).
Thus, norms appear often when we describe systems. (Robustness, nonlinear systems,
convergence of learning systems such as adaptive equalizers or controllers).
Example 2.50 Consider a hands-free telephone (Ger.: Freisprechtelefon) as depicted in
Figure 2.12. The far end speaker signal is linearly distorted by echos (h). These echos
are electronically reproduced by filtering with ĥ. If both impulse responses are identical, the
echos disappear and only the local speaker signal vk is transmitted to the far end speaker.
A measure of how much the two agree is given by the norm
kh − ĥk2 .
Once this norm is small, the adaptation algorithm for learning h rests, while for large norms
the algorithm needs to run.
Local speaker vk
Far end speaker sk rk
h + -
ĥ
sk rk sˆk ≈ sk
h + f
of the form
f (|ŝk − sk |)
which it tries to minimize (adaptive process). The error at its output is given by an error
vector:
ek = [ŝk−N − sk−N ,ŝk−N +1 − sk−N +1 ,...,ŝk − sk = [ek−N ,...,ek ].
Let the adaptive equalizer have the property that the error signal from k − 1 to k is mapped
via the learning rule: y = g(x) = x3 . Under which condition is the equalizer adaptive?
For the error we find
ek−N e3k−N −1
ek−N +1 e3
k−N
ek = .. = f (ek−1 ) = .. .
. .
3
ek ek−1
We can thus compare the energy norms of the error vector form one time instant to the
next under worst case conditions:
kek k22 e2k + e2k−1 + ...e2k−N e6k−1 + e6k−2 + ...e6k−N −1
sup 2
= sup 2 2 2
= sup 2 2 2
< 1.
ek ∈lh kek−1 k2 ek ∈lh ek−1 + ek−2 + ... + ek−N −1 ek ∈lh ek−1 + ek−2 + ... + ek−N −1
As long as |ek | < 1 we find that the output is always smaller than the input energy. Thus
no matter what the error sequence is, as long as it preserves such property the algorithm
ensures that the error terms decrease and thus the output signal becomes sk .
Example 2.52 Consider the linear system in matrix-vector-form.
yk−N +1 hM −1 . . . h0 xk−N −M +2
.. hM −1 . . . h0 ..
. .
.. ..
yk−2 =
hM −1 . .
xk−2 .
yk−1
.. ..
xk−1
. . h0
yk hM −1 . . . h0 xk
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 89
or short:
y N,k = HN xN +M −1,k .
Consider the ratio of input and output energy:
ky N,k k22
sup = f (HN )?
xk ∈lh kxN +M −1,k k22
The linear system HN will relate the input energy to an output energy. The ratio may
depend on the input sequence xk . But if we ask for the largest possible value for this ratio,
then it should only be dependent on the system HN itself. But at this point we cannot see
how to derive such property. In case the system is an allpass, how would we be able to
identify this property just based on the knowledge of HN ? The answers to these questions
will follow. See for example the discussion after (2.6) and in Section 5.4.3.
m X
n
! p1
X
kAk0p = |Ai,j |p .
i=1 j=1
The most common one in this context is the so-called Frobenius9 norm for p = 2:
m X
X n
kAk2F = |Ai,j |2 = tr(AH A).
i=1 j=1
Later (see Example 4.33) we will recognize that this (squared)norm is identical to the
summed up squares of the singular values of such matrix
p
X
kAk2F = σi2 ; p = min(m,n).
i=1
9
After the German Mathematician Ferdinand Georg Frobenius (1849-1917).
90 Signal Processing 1
There exist other norms on matrices. The most common ones are the maximum row
and column sums, that is
m
X
kAk1 = max |Aij | (2.1)
j
i=1
Xn
kAk∞ = max |Aij | =
AT
1 . (2.2)
i
j=1
that is only the real part results. Both formulations allow to describe inner products by
(energy) norms.
Vector norms can also be used to induce norms on multivariate functions. Consider the
following example:
f (x1 ,x2 ) = |x1 x2 | exp(−|x1 x2 |).
The graph of the function is illustrated in Figure 2.14. We induce a vector norm, i.e.
|f (x1 ,x2 )|
kf k = sup ,
x=[x1 ,x2 ] kxk
and select the l2 -norm:
|x1 x2 | exp(−|x1 x2 |)
kf k2 = sup
x=[x1 ,x2 ] kxk2
|x1 x2 | exp(−|x1 x2 |)
= sup p ≈ 0.2.
x=[x1 ,x2 ] x21 + x22
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 91
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
5
4
3
5
2 4
1 3
0 2
-1 1
0
-2 -1
-3 -2
-4 -3
-4
-5 -5
Very common are vector norms that are induced to a matrix, for example all vector
p-norms define induced matrix norms:
kAxkp
kAkp,ind = sup = sup kAxkp .
x kxkp kxkp =1
kAxk2 p
kAk2,ind = sup = sup kAxk2 = λmax (AH A) = σmax (A).
x kxk2 kxk2 =1
It can be interpreted as the largest elongation of an ellipsoid that is the output defined by
kAxk2 for an input vector on a unit sphere given by kxk2 = 1, see also Example 4.33.
In mathematical terms the variation in input vectors x stimulates the various modes λi
(eigenvalues) of A at its output, thus displaying the spectrum of these modes. However,
technically speaking, we can also interpret this for a matrix that describes the impulse
response of an FIR system. Consider an input vector at time instant k of the form
that is for a system of order M , a harmonic excitation at frequency Ω. Once the time
92 Signal Processing 1
continues we find
In other words, all such vectors are linearly dependent, independent of their order M or
their frequency Ω. Let’s now use such vector series as input of an FIR system with impulse
response
h = [h0 ,h1 ,...,hM −1 ]T .
We can describe the input-output relation by
h0 . . . hM −1 0 exp(jΩ(k + l))
... ..
h0 . . .
.
AM xM,k+l =
. ..
hM −1 exp(jΩ(k + l + m − 2))
h0 . . . exp(jΩ(k + l + m − 1))
H(exp(jΩ)) exp(jΩl) exp(jΩl)
.. ..
= . = H(exp(jΩ)) .
H(exp(jΩ)) exp(jΩ(l + m − 2)) exp(jΩ(l + m − 2))
H(exp(jΩ)) exp(jΩ(l + m − 1)) exp(jΩ(l + m − 1))
= H(exp(jΩ))xM,l .
If we compute now the 2-induced norm for such matrix AM , we find for a growing size M
kAM xk2
lim kAM k2,ind = lim sup (2.4)
M →∞ M →∞ x kxk2
AM xM,k+l
= lim sup
2 (2.5)
M →∞
xM,k+l
kxM,k+l k2 =M 2
In other words, for a transfer function, we can simply detect the frequency for which they
take on their maximum magnitude, that is the gain of such line time-invariant system. The
so obtained value is equivalent to the 2 induced norm of the corresponding Toeplitz matrix
once its dimensions grows large. Note also the further remarks on this in Section 5.4.3.
This result already provides an answer to the questions in Example 2.52, telling us that
there exists an excitation vector of harmonic form for which we have the largest gain. If
this gain is larger than one, we have an amplifying system. If the gain is smaller than one,
there is no excitation for which more energy can come out. We thus have then a passive
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 93
system.
Although we do not have in general that kAkp 6= kAkp,ind , for two cases there is identical
values
To understand such identities, we have to consider special cases for kxk1 = 1 and kxk∞ = 1.
If the unit norm kxk1 = 1 is achieved, all energy can be in one element of vector x, that
for x = γem with |γ| = 1. The unit vector em thus selects the m−th element and neglects
the remaining. In this case we can compute the maximum of the i−th element of Ax:
X
max Aij xj = max |Aij |.
kxk1 =1 j
j
If on the other hand, the unit norm kxk∞ = 1 is achieved, all elements of x can be equally
strong, that is x = [γ1 ,γ2 ,...,γM ]T with |γi | = 1; i = 1,2,...,M .
X X
max |Aij xj | = |Aij |
kxk∞ =1
j j
Note however that if the matrix are not square, the norms can be different mappings.
All induced vector-p norms have this submultiplicative property:
kABxkp
kABkp,ind = sup
x kxkp
kAykp kykp
= sup
x kykp kxkp
kykp
≤ kAkp,ind sup
x kxkp
kBxkp
= kAkp,ind sup = kAkp,ind kBkp,ind .
x kxkp
Example 2.54 Note that not all matrix norms have this submultiplicative
property. Take
1 1
for example kAk∆ = maxij |Aij |. Consider A = B = . Here we find kABk∆ = 2 >
1 1
kAk∆ kBk∆ .10
10
Note, however, that the following relations hold for any unitary invariant norm:
kABk ≤ kAkkBk2,ind ,
kABk ≤ kBkkAk2,ind .
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 95
Example 2.55 Let us return to our previous Example 2.52 and try to understand if norms
can help us to find out if a given matrix relates to an allpass or not. When we consider the
ratio of input and output signal norm, the first problem we encounter is a technical one: the
input vector is of different length than the output vector. This can be helped by extending
the horizon to N → ∞, making the vectors infinitely long:
kHN xN kp
lim kHN kp,ind = lim sup .
N →∞ N →∞ x6=0 kxN kp
If we find
lim kHN kp,ind = 1,
N →∞
we still cannot conclude to have an allpass as this reflect the worst case scenario for all
input sequences. For the case p = 2, we obtain the spectral norm which means:
lim kHN k2,ind = max |H(ejΩ )|,
N →∞ Ω
thus the maximum of the transfer function. At this point we only have a necessary but not
sufficient condition.
Example 2.56 A further problem seems to be that the dimension of HN needs to grow
which is numerically difficult to handle. We could take the state space form instead as we
have matrices of finite dimension there.
z k+1 A b zk
= .
yk cT d xk
Let us take, for example, the simple transfer function yk = xk−2 , that is a pure delay by two
time instances. This must be an allpass. We find
0 1 0
z k+1 = zk + xk ,
0 0 1
yk = [1,0]z k +0xk .
We can combine both equations into
0 1 0
z k+1 zk
= 0 0 1
.
yk xk
1 0 0
Summing up all energy terms from zero to N , we find that
|zN +1 |2 + N 2
P
k=1 |yk |
= 1.
|z0 |2 + N
P 2
k=1 |x k |
Including the initial memory storage and the final, we recognize that the system indeed
behaves like an allpass.
96 Signal Processing 1
Example 2.57 We consider a third allpass example, this time a time variant system. Let
us start with a time-invariant system with two inputs and two outputs
" # " (1) #
(1)
yk a b xk
(2) = (2) = Ax.
yk −b a xk
p
As long as b = 1 − |a|2 and 0 ≤ |a| ≤ 1 the system behaves like an allpass. We can easily
extend this behavior to a time-variant system and find
" # " (1) #
(1)
yk ak b k xk
(2) = (2) = Ak x.
yk −bk ak xk
With help of the small gain theorem we can solve stability problems as the example in
Figure 2.15 where nonlinear elements are included in the transfer path. Once the feedback
loop is closed, one would like to know conditions under which the closed loop system behaves
stable.
Let us consider an input-signal xk , k = 1,2,...,N , denoted by a vector xN of the dimension
N × 1. Then, let the answer of a system HN onto this signal be given by
y N = HN xN .
Definition 2.34 A mapping H is called l-stable if two positive constants γ,β exist, such
that for all input-signals xN , for the output-signal, the following holds:
ky N k = kHN xN k ≤ γkxN k + β.
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 97
xN hN yN
y=1/(1+h2)
-
zN gN
0,5q-1
Figure 2.15: Feedback loop including non-linear elements. Under which circumstances is
the closed loop system stable?
Definition 2.35 The smallest positive constant γ, for which the l-stability is fulfilled, is
called gain of the system HN .
We now examine a feedback arrangement of the two systems HN and GN , with the gains
γh and γg , respectively, where the following holds (see Figure 2.16):
y N = HN hN = HN [xN − z N ]
z N = GN g N = GN [uN + y N ].
xN hN yN
HN
zN gN uN
GN
Theorem 2.3 (Small Gain Theorem) If the gains γh and γg are such that
γh γg < 1,
1
khN k ≤ [kxN k + γg kuN k + βh + γg βh ]
1 − γh γg
1
kg N k ≤ [kuN k + γh kxN k + βg + γh βg ] .
1 − γh γg
hN = xN − z N
g N = uN + y N .
Example 2.58 Let the automatic control of a cell phone power amplifier look like a feedback
loop in which the actual power amplifier behaves like
ρ1 |x2 |2
" #
0 2 1+ρ 1 |x2 |
2
H2 = ρ2 |x1 |2
2,5 1+ρ 2 |x1 |
2 0
with two inputs h2 = [x1 ,x2 ]T denoting the I and Q phase of the amplifier. The nonlinear
saturation behavior is parameterized by some positive constants ρ1 ,ρ2 > 0. In general one
expects the amplification to be on the main diagonal of H2 . In this case the engineer has
swapped I and Q and thus the entries moved away from the diagonal. We also recognize
that the gain in the individual paths is slightly different as this is not untypical for analog
devices. Correspondingly a feedback system G2 , often introduced to linearize the nonlinear
power amplifier, is given by:
1
0 3
G2 = .
0.37 0
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 99
In order to apply the small gain theorem we have to compute the gains of the two systems.
Let us start with the linear system G2 : z 2 = G2 g 2 . Here we need to compute
1
0
kz 2 k2 ≤ kG2 k2,ind kg 2 k2 =
3
0.37 0
kg 2 k2 = 0,37kg 2 k2 .
2,ind
The gain for the power amplifier is more challenging to compute. We also have:
ρ1 |x2 |2
" #
0 2 1+ρ
|x | 2
ky 2 k2 ≤ kH2 k2,ind kh2 k2 =
1 2
kh2 k2 = 2,5kh2 k2 .
ρ2 |x1 |2
2,5 1+ρ |x |2 0
2 1 2,ind
x,y
α= .
kyk22
Thus:
| x,y |2
0 ≤ min kx − αyk22 = kxk22 −
α kyk22
| x,y | ≤ kxk2 kyk2
The Cauchy-Schwarz Inequality is a very powerful tool due to its second part that allows
a maximization or minimization and thus to optimize particular problems. Four of those
are shown in the next examples.
E [(x − mx )(y − my )]
rxy = .
σx σy
The fact that a correlation coefficient always lies in the range [−1,1] is due to the Cauchy-
Schwarz Inequality. To understand this, let us first transform the two variables in zero
mean variables, i.e., x0 = x − mx and y0 = y − my for which we know that σx = σx0 and
σy = σy0 . As E [x0 y0 ] = hx0 ,y0 i, the original problem can thus equivalently be formulated as
E [x0 y0 ]
rx0 y0 = .
σx0 σy0
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 101
Due to the definition of σx2 0 = E[x02 ] we recognize now Cauchy-Schwarz and conclude
E [x0 y0 ]
−1 ≤ p ≤ 1.
E[x02 ]E[y02 ]
The same relation also holds for the more practical implementation of a correlation coeffi-
cient based on a time series. Let us assume we have a set of pairs (xk ,yk ) with k = 1,2,...,N
elements, then we find an inner product
N
1 X
hxk ,yk i = xk yk ,
N k=1
v(t)
aks(t) d(t)
+ r(t) g(t)
Figure 2.17: Matched filter problem: How to design g(t) so that the SNR at its output is
maximal?
symbols ak s(t − kT ) periodically every T with a fixed symbol shape s(t) and an information
carrying sequence ak ∈ (−1,1). At the input of the receiver filter we observe the signal
If we assume that the random variables ak at time instant k are zero mean, we find
E[|ak |2 ] = 1. We further assume for the additive white noise E[v(t)v∗ (t+τ )] = σv2 δ(τ ). The
question is now what impulse response g(t) manages to maximize R the signal-to-noise ratio
(SNR). The signal component if the noise is absent is given by ak g(t−τ )s(τR )dτ , the noise
component under the absence of the signal, on the other hand is given by g(t − τ )v(τ )dτ .
If both signal energies are compared, we obtain the SNR:
| hg(t − τ ),s(τ )i |2
SNR = .
σv2 hg(τ ),g(τ )i
In order to maximize the SNR, we have to find the right filter g(τ ):
| hg(t − τ ),s(τ )i |2
max SNR = max
hg(τ ),g(τ )i=1 hg(τ ),g(τ )i=1 σv2 hg(τ ),g(τ )i
| hg(t − τ ),s(τ )i |2
= max 2 hg(t − τ ),g(t − τ )i hs(τ ),s(τ )i
.
<g(τ ),g(τ )>=1 σv
In the last step we augmented the term hs(τ ),s(τ )i which we assume to be constant, e.g.,
hs(τ ),s(τ )i = 1. The SNR is thus maximized if and only if we select the filter g(t − τ ) =
αs(τ ). In this case we find that the maximum SNR is given by 1/σv2 .
a TDMA frame, i.e., to find the beginning of the frame. Design an optimal synchronization
filter with impulse response fk ; k = 0,1,...,L − 1 that ensures maximally good detection.
In such a TDMA transmission a unique sequence sk is periodically transmitted followed
by (random) data symbols, say ik as depicted in Figure 2.18. We expect that the probability
s Infodata ik s
Figure 2.18: Correlator problem: How to optimally detect the unique sequences sk ?
L−1
X
dk = rk+l fl∗ = hrk ,fk i .
l=0
and
max | hrk ,fk i |2 = hrk ,rk i hfk ,fk i
fk
obtained for fk = αsk . The filter maximizes the signal-to-interference ratio (SIR). Even in
case of a received sequence rk that is distorted by noise, the correlation filter provides the
best choice in terms of signal-to-interference-and-noise ratio (SINR).
1. Energy identity in time and frequency domain. This mostly refers to Plancherel’s
Theorem (and not Parseval which is valid for Fourier series) in literature.
Z ∞ Z ∞
2 1
f (t)dt = |F (jω)|2 dω.
−∞ 2π −∞
We now apply the Cauchy-Schwarz Inequality for functions with f (t) → tf (t) and g(t) →
f 0 (t) and obtain
Z ∞ 2 Z ∞ Z ∞
0
tf (t)f (t)dt ≤ 2 2
t f (t)dt f 02 (t)dt.
−∞ −∞ −∞
12
f 2 (t)tdt = 0 and the same goes for the frequency:
R
We assume here that the average time is zero, i.e.,
A2 (ω)ωdω = 0.
R
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 105
or equivalently, R∞ R∞
f 2 (t)t2 dt f 02 (t)dt 1
R−∞
∞ R−∞
∞ ≥ .
−∞
f 2 (t)dt −∞
f 2 (t)dt 4
Applying properties 1 and 2 leads to
R∞ 2 1
R∞
−∞
f (t)t2 dt 2π −∞
|F (jω)|2 ω 2 dt 1
R∞
2 1
R∞
2
≥ .
−∞
f (t)dt 2π −∞ |F (jω)| dt 4
Applying the square root on both sides of the inequality returns the desired relation. The
Cauchy-Schwarz Inequality also offers us an answer to the question for which function f (t)
equality is obtained. The answer is for those functions that have: tf (t) = αf 0 (t). The only
function that satisfies this relation are f (t) = exp(−αt2 ). Thus the Gaussian bell curve is
the only function that is as localized in time as in frequency.
A straightforward result for time series is not so easy to derive. We refer the interested
reader to the formulation in [6].
2.5 Exercises
Exercise 2.1 Test the following definitions whether they define a metric for vectors with
a finite number of elements xi ,yi :
1. da (x,y) = |xi − yi |,
for x0 = 1.
In the following we consider the so-called approximation problem. Consider the following
application: let x be a signal to transmit. Rather than transmitting x directly, we transmit
an approximation of x. This approximation goes in terms of a few coefficients which carry
much less data than the original vector x. If the number of coefficients is much smaller
than the entries of the vector, we reduce the amount of data that is stored or transmitted.
Remains the question, how is this possible.
approximates x in the best sense by a linear combination, thus the error vector e
e = x − x̂
becomes minimal.
In order to minimize e, it is of advantage to introduce a norm. If taken an l1 - or
l∞ -norm, the problem would become mathematically very difficult to treat. However,
107
108 Signal Processing 1
utilizing the induced l2 -norm, we typically obtain quadratic equations, solvable by simple
derivatives. Later, we will also introduce iterative LS methods that can be used to solve
problems in other norms, see Section 3.4. Note that if x is in V , then the error can
become zero. However, if x is not in V , it is only possible to find a very small value for ||e||2 .
The vectors in T allow to find an approximate solution with a very small error ||e||2 . The
receiver knows T . We thus need only to transmit the m coefficients cm . Is the number m
of the coefficients much smaller than the number of samples in x, we obtain a considerable
data reduction. The price for this is a representation of x that is not exactly x. The quality
measure of our approximation is related to the remaining energy in the error vector x. As
our hearing and seeing is not perfect, it is sufficient to describe audio and video signals only
as precise as our hearing and seeing works. These principles define the quality of audio and
video coders as they are being used today in our cell phones and video cameras.
Now that we have a proper description of the problem
2
m
X
min x − ci p i ,
c1 ,c2 ,...,cm
i=1 2
we can try to solve it. In order to visualize the problem, let us first consider a single vector
in T = {p1 } ∈ IR. We thus have: e = x − c1 p1 and have to minimize kek22 :
T
min kek22 = min kx − c1 p1 k22 = min x − c1 p1 x − c1 p 1 .
c1 c1 c1
∂ T ∂ T
x − c1 p 1 x − c1 p 1 = x x − c1 pT1 x − c1 xT p1 + c21 pT1 p1
∂c1 ∂c1
= −pT1 x − xT p1 + 2c1 pT1 p1 = 0.
As we minimized the squared l2 -norm, we call this method the Least-Squared (LS) method.1
Geometric Interpretation: See Figure 3.1 for illustration. The so obtained minimal
error eLS = x − cLS,1 p1 stands orthogonal (perpendicular) onto the given direction
D E
p1 : eLS ,p1 = 0.
1
The method goes back to its inventor Johann Carl Friedrich Gauß (30.4.1777-23.2.1855) who developed
the method to recover the return of asteroid Ceres in 1800.
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 109
x x
e eLS
p1 c1p1
c1p1
Figure 3.1: Least squares. Upper row: the right amount c1 p1 leads to the closest point to
x along the direction up1 . Lower row: drawing circles with x as their center, the first circle
that hits p1 has radius eLS .
Note, however, the problem is not restricted to vectors. All objects of the linear vector
space are possible. We can thus write this more generally in terms of inner vector products
of x and p1 and use the vector notation only if explicitly vectors are meant:
∂ ∂
kx − c1 p1 k22 = hx − c1 p1 ,x − c1 p1 i
∂c1 ∂c1
= h−p1 ,x − c1 p1 i + hx − c1 p1 , − p1 i = 0,
where x and the p1 are of identical dimension. They can be vectors, matrices, functions,
series and many more. We obtain again
hx,p1 i
cLS,1 = .
kp1 k22
Let’s check if this orthogonality property that we obtained form the geometric interpretation
also holds in general. For this we compute the inner product of eLS and p1 :
heLS ,p1 i = hx − cLS,1 p1 ,p1 i
= hx,p1 i − hcLS,1 p1 ,p1 i
hx,p1 i
= hx,p1 i − hp1 ,p1 i
kp1 k22
= hx,p1 i − hx,p1 i = 0.
110 Signal Processing 1
What has worked well for a single component p1 should also work for a linear combi-
nation with m components, based on a set {p1 ,p2 ,...,pm } of m terms. We thus redo the
calculation but now with m components. We consider the optimal weight at position k,
that is:
∂ 2 ∂ ∂
kek2 = e,e + e, e = 0.
∂ck ∂ck ∂ck
If we follow the same procedure as before, we recognize that
∂
e,e = − hpk ,ei
∂ck
as well as
∂
e, e = − he,pk i .
∂ck
This leads to the condition
hpk ,eLS i + heLS ,pk i = 0.
Pm
Replacing eLS = x − i=1 cLS,i pi we find
* m
+ * m
+
X X
pk , x − cLS,i pi + x− cLS,i pi , pk = 0.
i=1 i=1
The condition is satisfied if and only if one of the inner products is zero (as the other is its
conjugate complement) and we obtain
* m
+
X
x− cLS,i pi , pk = 0, k = 1,2,...,m
i=1
or, equivalently,
* m + m
X X
hx, pk i = cLS,i pi , pk = cLS,i hpi , pk i , k = 1,2,...,m.
i=1 i=1
These are indeed m linear equations for k = 1,2,...,m. If we arrange them line by line, we
obtain the following set of equations:
hp1 ,p1 i hp1 ,p2 i ... hp1 ,pm i cLS,1 hx,p1 i
hp2 ,p1 i hp2 ,p2 i ... hp2 ,pm i cLS,2 hx,p2 i
.. =
.. ..
. . .
hpm ,p1 i hpm ,p2 i ... hpm ,pm i cLS,m hx,pm i
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 111
or in short RcLS = p. The solution of such matrix equation is called (linear) Least-Squares
solution. Whether such a matrix equation has a unique solution, depends solely on matrix
R. Matrix R needs to be positive-definite in order to obtain a unique solution.
Proof:
Let q T = [q1 ,q2 ,...,qm ] be an arbitrary vector:
m X
X m m X
X m
q H Rq = qi∗ qj Rij = qi∗ qj hpj ,pi i
i=1 j=1 i=1 j=1
m X
m
* m m
+
X X X
= hqj pj ,qi pi i = qj pj , qi pi
i=1 j=1 j=1 i=1
m
2
X
=
qi pi
≥ 0.
j=1 2
Conversely, if R is not positive-definite, then a vector q must exist (unequal to the zero
vector) so that:
q H Rq = 0.
2
Jorgen Pedersen Gram from 27. 6.1850 to 29. 4.1916 was a Danish Mathematician.
112 Signal Processing 1
Thus also:
2
Xm
q i pi
= 0
j=1 2
and m
X
qi pi = 0.
j=1
Note that such method by matrix inverse of the Gramian requires a large complex-
ity. This can be reduced considerably if the vectors p1 ,p2 ,...,pm are chosen orthogonal
(orthonormal). In this case the Gramian becomes diagonal (identity) matrix. We thus
concentrate later on the search for orthonormal bases.
Theorem 3.2 Let (S,k · k) be a linear, normed vector space and T = {p1 ,p2 ,...,pm } a subset
of linear independent vectors from S and V =span(T ). Given an element x from S. The
coefficients cm minimize the error e in the induced l2 -norm by a linear combination
x̂ = c1 p1 + c2 p2 + ... + cm pm
Proof:
To show (by substitution)
This results in one equation for every index of j = 1,2,...,M . If we combine all M equations
in vector form, we obtain:
p − RcLS = 0,
which is identical to what we obtained if we minimize the squared error norm. Both
conditions lead to the same result.
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 113
Note: since eLS is orthogonal to every component pj , eLS must also be orthogonal to the
estimate: * +
M
X
heLS ,x̂i = eLS , cLS,i pi = 0.
i=1
In general we find
heLS ,x̂i = 0,
heLS ,xi = heLS ,eLS i ≥ 0.
The later is zero if and only if x ∈ span{T }.
Example 3.1 A nonlinear system f (x) is excited harmonically. Which amplitudes have
the harmonics? A possibility to solve this problem is to approximate the nonlinear system
in form of polynomials. For each polynomial the harmonics can be pre-computed and thus
the summation of all terms results in the desired solution. For high order polynomials this
can become very tedious. An alternative possibility is to assume the output as given in the
form:
f (sin(x)) = a0 + a1 sin(x) + a2 sin(2x) + ... + b1 cos(x) + b2 cos(2x) + ... − e(x).
We then approximate f (x) by the coefficients â0 ,â1 ,...b̂1 ,... and find
e(x) = f (sin(x)) − â0 + â1 sin(x) + â2 sin(2x) + ... + b̂1 cos(x) + b̂2 cos(2x) + ... .
Since the functions {sin(x), cos(x)} build an orthogonal basis, the results are readily com-
puted by LS methods.
In general the set of linear equations RcLS = p can be very easily solved, if the basis
vectors are orthogonal. The Gramian is a diagonal matrix, say D and the desired solution
cLS = D−1 p which only requires m divisions. Even better is the situation, if all vectors are
orthonormal, as we find R = I and cLS = p, we only have to compute the right-hand side.
Example 3.2 The matrix equation Rc = p with the non-negative matrix R is to solve:
2 1 2 1
Rc = p : c= ⇒c= .
1 3 1 0
We start with the initial value c0 = 0 and a step-size value of µ = 0,3 and obtain in the
first iteration:
2 0,6
c1 = 0 + µ(p − 0) = µp = 0,3 = .
1 0,3
The second iteration reads:
0,6 2 2 1 0,6 0,75
c2 = c1 + µ(p − Rc1 ) = + 0,3 − = .
0,3 1 1 3 0,3 0,15
Figure refiter depicts how the iteration continues. The two components of ck finally reach
their destination [1,0]T .
1
0.9
0.8
0.7
0.6
c(1)
c(1),c(2)
c(2)
0.5
0.4
0.3
0.2
0.1
0
0 2 4 6 8 10 12 14 16 18 20
number of iterations k
Figure 3.2: Iterative solution: after a few iterations the two values approach their final
destination.
3.1.4 Pseudoinverses
If we arrange all vectors pj ; j = 1,2,...,m in a matrix A:
h i
A = p1 ,p2 ,...,pm ,
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 115
AH AcLS = AH x
and recognize that AH A = R resembles the Gramian while AH x = p is the right hand side.
Inverting the Gramian results in
The so obtained matrix Bl = [AH A]−1 AH is called the (left) pseudoinverse of A as multi-
plying A from the left by Bl results in identity. In general there are two possibilities for
constructing pseudoinverses.3
Definition 3.3 (Pseudoinverse) The matrix Bl = (AH A)−1 AH is called left pseudoin-
verse of A. The matrix Br = AH (AAH )−1 is called right pseudoinverse of A.
The right pseudoinverse has its importance in the context of underdetermined LS prob-
lems. We will discuss them later. Note also that Eq. (3.2) explains that there is a linear
mapping (operator) that translates the observation x into the desired coefficients. Once, A
is known, Bl maps the observation x to the LS-coefficients cLS .
x̂ = A(AH A)−1 AH x
that is, the estimate or approximation x is also a linear mapping of the observation. This
mapping, defined as A(AH A)−1 AH is very particular in its properties.
Definition 3.4 (Projection) A linear mapping of a linear vector space onto itself is called
a projection, if P = P 2 . Such an operator is called idempotent.
Proof:
(I − P )2 = I − 2P + P 2 = I − P.
3
The idea of pseudoinverses was independently developed by E. H. Moore in 1920, Roger Penrose in
1955. Often they are called the Moore-Penrose Inverses of a matrix.
116 Signal Processing 1
We thus recognize that A(AH A)−1 AH is indeed a projection matrix. If A(AH A)−1 AH
maps x to its approximation x̂, what does the corresponding projection I − A(AH A)−1 AH
do? We find the two mappings:
While x̂ lies in the column space of A (is a linear combination of the column vectors of
A), the LS-error eLS does not. It is orthogonal to the approximation, which we can easily
verify:
hx̂,eLS i = A(AH A)−1 AH x, I − A(AH A)−1 AH x = 0.
V W S = W +V
( −1
)
xˆ = A A A A H x
H
( (
e LS = I − A A H A )
−1
AH x )
Let : x = v + w v ∈V , w ∈ W
xˆ = A(AH A) AH x
−1
(
e LS = I − A(AH A) AH x
−1
)
= A(AH A) AH (v + w ) = v
−1
= (I − A(A H
A) AH
−1
)(v + w) = w
projections are self-adjoint: P [.] = P ∗ [.] (for notation and more details see Section 4.1.3
ahead), oblique projections not. The formal definitions will follow in the next chapter.
f k+1 = f k + µ(P f k − f k ),
f k+1 − f ∗ = f k − f ∗ + µ(P f k − P f ∗ + f ∗ − f k )
| {z }
f̃ k+1
f̃ k+1 = f̃ k + µ(P f̃ k − f̃ k ).
kf̃ k+1 k22 = (1 − µ)2 kf̃ k k22 + (µ2 + 2µ(1 − µ))kP f̃ k k22
= (1 − µ)2 + (µ2 + 2µ(1 − µ)) kf̃ k k22 .
For step-sizes between 0 < µ < 2 it can be shown that the term before the norm is strictly
bounded between and one, thus the energy term kf̃ k k22 decays with every iteration and
must finally end up at zero. If then norm is zero, the vector is zero and thus limk→∞ f k = f ∗ .
Figure 3.4 illustrates this behavior for two values of . The algorithm has seen a renaissance
recently (around 2010) and many interesting generalizations have been developed. Many
learning algorithms (machine learning) can be interpreted as a relaxed projection.
[(1
− µ ) + (µ + 2 µ (1 − µ ) )ε ]
2
2
ε ≤...≤1 for 0≤ µ ≤ 2 1
= (1 − ε )(1 − µ ) + ε 2
0.9
0.8
ε=0.9
0.7 ε=0.1
0.6
0.5
0.4
0.3
0.2
0.1
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
µ
Example 3.3 Let us consider a polynomial fit: given a function f (x), it is to be fit by a
polynomial p(x) of order m optimally in the interval [a,b]. For this the following quadratic
cost function is selected:
Z b
min (f (x) − c0 − c1 x − ... − cm−1 xm−1 )2 dx
c0 ,c1 ,...,cm−1 a
Applying LS we find the following linear set of equations in the unknown coefficients
h1,1i hx,1i . . . hxm−1 ,1i c0 hf (x),1i
h1,xi hx,xi . . . c1 hf (x),xi
= .
.. .. .. .. ..
. . . . .
m−1 m−1 m−1
h1,x i . . . hx ,x i cm−1 hf (x),xm−1 i
b
bi+j+1 − ai+j+1
Z
i j
x ,x = xi+j dx = .
a i+j+1
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 119
Normalizing the interval to [0,1], we obtain for the Gramian the so-called Hilbert matrix:
1 1
1 2
... m
1 1 1
. . . m+1
2 3
R = .. .
. .
. .
1 1 1
m m+1
. . . 2m+1
For this matrix it is known that for growing order m the matrix is very poorly conditioned.
It is thus difficult to numerically invert the matrix. Due to this reason typically (simple)
polynomials are not being used for approximation problems. For small values of m (m < 4)
this effect is not so dramatic.
Example 3.4 The function ex is to approximate by polynomials. The Taylor series results
in: ex = 1 + x + x2 /2 + ... LS on the other hand delivers: 1,013 + 0,8511x + 0,8392x2 + ....
Figure 3.5 depicts the approximation error over the interval [0,1].The Taylor series 5
provides only good results close to its centre point, that is zero. (Maclaurin series) We
can thus improve the Taylor series approach by shifting its centre point in the middle of
the interval. Still on the border of the interval, the LS approach shows better results. In
this application what counts is the largest possible error. If we like the largest error to
become minimal, it is not sufficient to minimize the L2 -norm but in this case the L∞ -norm
is required:
Z b p1
min lim (f (x) − c0 − c1 x − ... − cm−1 xm−1 )p dx ,
c0 ,c1 ,...,cm−1 p→∞ a
which is not an LS problem but can be treated as so-called weighted LS problem, see Sec-
tion 3.4.
Example 3.5 Linear Regression Probably the most popular application of LS. The in-
tention is to fit a line so that the distance (more precisely the projection distance onto the
x-axis) between the observations and their projection onto the line becomes (quadratic) min-
imal. Figure 3.6 illustrates the typical scenario. Given the data pairs (xi ,yi ), how do we
find the line with minimum distance? For this we write down a set of equations, one for
each data pair:
y1 ax1 + b e1 x1 1 e1
y2 ax2 + b e2 x2 1 e2
a
.. = .. + .. = .. .. b + .. ,
. . . . . .
yn axn + b en xn 1 en
5
After the English mathematician Brook Taylor (18 August 1685 - 29 December 1731) and the Scottish
mathematician Colin Maclaurin (February 1698 - 14 June 1746).
120 Signal Processing 1
or in short
y = Ac + e.
We thus obtain the LS solution as:
c = (AH A)−1 AH y.
Unusual here is the fact that matrix A out of which we compose the Gramian R = AH A
contains data itself. It can thus not be precomputed and for each data set it has to be
computed anew. Note however that the Gramian in this case is a square matrix of order
two. Nevertheless, depending on the given data the problem can be more or less numerically
challenging.
Example 3.6 Another example in which the observation data influence the Gramian ma-
trix is given next. In order to describe observations in terms of simple and compact key
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 121
parameters, often so-called parametric process models are being applied. A frequently used
process is the Auto-Regressive (AR) Process, that is build by linear filtering of past values:
nP
X
xk = a1 xk−1 + ... + anP xk−nP + vk = an xk−n + vk .
n=1
The (driving) process vk is a white random process with unit variance. AR processes are
applied to model strong spectral peaks:
1
xk = vk ,
1 − a1 q −1 − ... − anP q −nP
X∞
sxx (ejΩ ) = E[xk xk+l ]e−jΩl ,
l=−∞
2
1
=
−jΩ −jΩn
.
1 − a1 e − ... − anP e P
122 Signal Processing 1
A typical (short-time) spectrum of human speech is shown in Figure 3.7. Here three res-
onances of the vocal tract so-called formants are visible, the order of the process is thus
nP = 3. Linear prediction of speech signals has already been discussed in Example 1.2
0.9
0.8
0.7
power spectrum
0.6
0.5
0.4
0.3
0.2
0.1
0
0 500 1000 1500 2000 2500 3000 3500 4000
frequency [Hz]
Figure 3.7: Typical short-time spectrum of human speaker, showing formants at 120, 530
and 3400 Hz.
where we argued that the linear prediction method is not a linear operator. LS methods can
be used to estimate such parameters a1 ,...,anP of an AR process:
xk = a1 xk−1 + a2 xk−2 + ... + ap xk−nP + vk ,
xk xk−1 xk−1 vk
xk = ... = a1 .. .. .
+ a2 + ... + ..
. .
xk−M xk−M −1 xk−M −2 vk−M
xk = a1 xk−1 + a2 xk−2 + ... + ap xk−nP + vk ,
where we stacked M values of the observations in vectors. We can thus write more compactly
xk = a1 xk−1 + a2 xk−2 + ... + ap xk−nP + vk
, = [xk−1 ,xk−2 ,...,xk−nP ]a + vk ,
= XnP ,k a + vk .
An estimation for a can be found from the observation xk by minimizing the estimation
error:
min kxk − XP,k ak22 ,
a
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 123
0 = XH H
nP ,k xk − XnP ,k XnP ,k a (3.5)
−1 H
a = XH
nP ,k XnP ,k XnP ,k xk . (3.6)
rk = h0 ak + h1 ak−1 + h2 ak−2 + vk .
Combining several of these from k = 2,3,....,L − 1 we can formulate this in vector form:
rL−1 aL−1 vL−1
rL−2 h0 h1 h2 aL−2 vL−2
rL−3
h0 h 1 h2
aL−3 vL−3
= + ,
.. .. ..
.. . . . .. ..
. . .
h0 h1 h2
r2 a0 v2
aL−1 aL−2 aL−3 v
aL−2 aL−3 aL−4 L−1
h0 vL−2
aL−3 aL−4 aL−5
= h1 + vL−3 ,
.. .
h2 ..
.
a2 a1 a0 v2
r = Ha + v = Ah + v.
The upper form of description is in Toeplitz form exactly as we learned in Section 2.2.2.
From there we know that the training sequence ak needs to be persistent exciting, here of
order three. Note that with this formulation the channel matrix that was in Toeplitz form,
now appears differently in the second line. Instead the data matrix in Hankel form appears.
With this reformulation the channel can be estimated by the Least-Squares method:
r = Ah + v
ĥ = [AH A]−1 AH r.
124 Signal Processing 1
Note that the training sequence is already known at the receiver and thus the Pseudo-Inverse
[AH A]−1 AH can be pre-computed. A persistent exciting training sequence ak guarantees the
existence of the pseudo inverse.
Example 3.8 Iterative Receiver Consider once again the equivalent description of the
previous example. We can continue now after the L training symbols a0 ,a1 ,...,aL−1 and use
the estimated symbols ãL ,ãL+1 ,...,ã2L−1 instead:
r2L−2 ã2L−1 v2L−2
h0 h1 h2
r2L−3 ã2L−2 v2L−3
h0 h1 h2
r2L−4 =
.. .. ..
ã2L−3 +
v2L−3 ,
.. . . . .. ..
. . .
h0 h1 h2
rL+1 ãL vL+1
ã2L−1 ã2L−2 ã2L−3 v2L−2
ã2L−2 ã2L−3 ã2L−4
v2L−3
h0
ã2L−3 ã2L−4 ã2L−5 v2L−4
= h1 + ,
... ..
h2
.
ãL+2 ãL+1 ãL vL+1
r = Ha + v = Ãh + v.
This means that the transmitted symbols as well as the channel coefficients can be estimated
in a ping-pong manner by an LS method. Once the channel estimates are improved,
aLS = (H H H)H H r
and so on. This is the principle of a so-called iterative receiver which is illustrated in
Figure 3.8. Typically the soft decoded data symbols which may be the outcome of an LS
estimation are fed back but the final decision requires so called hard symbols. A slicer
(quantization device) forces the soft symbols into symbols of an allowed alphabet.
Example 3.9 Consider an underdetermined system of equations, i.e., there are more pa-
rameters to estimate than observations.
x1
1 2 −3 −4
x2 = .
−5 4 1 6
x3
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 125
A# Slicer
ĥ â Hard
a~ Symbols
H# Soft
Symbols
As the system is underdetermined, it has many solutions. Find one solution, e.g., x =
[1,2,3]T . Then we can describe the manifold of solutions by
1 1
x = 2 + t 1 ;t ∈ C
l .
3 1
Of all these solutions that one with the least norm is of most interest.
x̂ = AH [AAH ]−1 b.
Note that AH (AAH )−1 is also a pseudoinverse to A since A × AH (AAH )−1 = I. It is called
the right pseudoinverse. Surprisingly, this solution delivers always the minimum norm
solution. The reason for this is that all other solutions have additional components that are
not linear combinations of AH (they are not in the column space of AH ).
Example 3.10 Consider the previous example again. The minimum norm solution is given
by x = [−1,0,1]T and not as possibly assumed [1,2,3]T !. A solution can be found by linear
126 Signal Processing 1
combinations of the row vectors of the matrix, i.e., [1,2, − 3] and [−5,4,1].
−1 1
x = 0 + t 1 ;t ∈ C
l .
1 1
1 −5 7
2 2 − 4 = 0 .
−3 1 −7
The minimum norm solution can thus be composed as linear combination of the row vectors
(row space) of the underdetermined matrix. Note that [−1,0,1]T is of the row space of A
but [1,2,3]T and [1,1,1]T are not.
Now, is [−1,0,1]T truly the minimum norm solution? We check that by writing straight
forwardly
2
−1 1
2
2 2 2
kxk2 =
0 + t 1
= (−1 + t) + t + (1 + t) .
1 1
2
Differentiating with respect to t delivers:
∂kxk22
= 2(−1 + t) + 2t + 2(1 + t) = 6t = 0 ⇒ t = 0.
∂t
So, indeed the minimum norm solution is given by [−1,0,1]T . The reader may try himself
to do it with the non minimum norm solution [1,2,3]T .
For some value of λ the first one is identical to the previous sparse problem. Thus, the
problem of finding λ remains.
The second formulation is a convex approximation for which efficient numerical solutions
exist. It is typically the preferred formulation for compressive sensing problems6 .
Example 3.11 Let us revisit Example 1.5 from the first chapter. We found that with
help of Bezout’s theorem an infinite amount of solutions exist for the equalizer problem.
But which solution is the best? This can be answered in the context of additive noise. If
we transmit data over a channel we model further uncertainties by additive noise. In its
simplest form it originates from thermal behavior of the electronic circuits. We thus receive
now
Additional to the delayed version of the input we now experience filtered noise by the equal-
izers. If we like to find those equalizers that minimize the noise power, we have to minimize
kg (1) ,g (2) k22 assuming that the noise sequences are of equal noise power and statistically
independent. Fortunately the LS approach delivers exactly this solution when solving
g (1)
(1) (2) 0
h0 h0 (1) 0
(1) (1) (2) (2) g1
h1 h0 h1 h0 (2) = 1 .
(1) (2) g0
h1 h1 (2)
0
g1
c = [AH W A]−1 AH W y.
6
In the context of sparse problems, an important matrix property is its spark.
It is the smallest
number
1 2 0 1
1 2 0 2
of linearly dependent columns (or rows). Take for example the matrix A = 1 2 0 3 . The spark of
0 2 1 4
this matrix is three as the first three columns can be linearly combined to the zero vector. The spark is to
be seen as a counter measure against the rank (see Definition 4.15) of a matrix.
128 Signal Processing 1
where we used the fact that powers are monotone functions and thus preserve minima. The
latter can be reformulated into
m
X
min kx − Ackpp = min |x − (Ac)i |p−2 |xi − (Ac)i |2 .
c c |i {z }
i=1 wi
We interpret a part of the norm as weighting term wi . The problem is thus formulated as
classical quadratic problem with a diagonal weighting matrix W . To solve this, an iterative
algorithm is known. Start with an initial LS estimate and then continue as follows:
In the last step a convex linear combination is selected with some value λ ∈ [0,1].
Here we placed all cos() terms in the vector c(Ω) = 2[1, cos(Ω), cos(2Ω),..., cos(N Ω)]. If
we would use a quadratic measure (as we did in a previous example), we would obtain the
Fourier coefficients. The magnitude response would be approximated only moderately. With
a larger norm, however, p → ∞ a much better result is obtained (equiripple design).
Z π
Hr (ejΩ ) − |Hd (ejΩ )|p dΩ.
lim min
p→∞ Hr (ejΩ ) 0
As the p-th root is a monotone function, we dropped it for the optimization process. In
Figure 3.9 we depict the obtained result when 81(N = 40) coefficients are applied to obtain
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 129
an ideal low pass filter. We recognize that the equiripple design allows a much steeper drop
around the cut off frequency. However, the attenuation at larger frequencies is better for
the Fourier solution (p = 2). The algorithm to obtain a solution for p → ∞ is called Remez
algorithm in the literature.
We approximate a function x(t) in the LS sense (L2 -norm) for orthonormal functions
pi (t):
m
X
x̂(t) = ci pi (t).
i=1
Due to the orthonormality of the basis functions pi (t); i = 1,2,...,m, we find
2
m
2 m
X
X
2
x(t) − ci pi (t)
= kx(t)k2 − hx(t),pi (t)i
| {z }
i=1 2 i=1 cLS,i
m
X
= kx(t)k22 − |cLS,i |2 ≥ 0.
i=1
The energy of the original signal is never exceeded by the energy of the LS coefficients
when applying an orthonormal basis.
Since the estimate is a Cauchy series and the Hilbert space is complete, we can follow
that the limit is also in the Hilbert space. However, not every (smooth) function can be
approximated by an orthonormal set point by point and are thus not in C[a,b]! Let us now
restrict ourselves to approximations in the L2 −norm. Even then not every function can be
approximated (well) with a set of orthonormal basis functions even if m → ∞.
Example 3.13 The set {sin(nt)}; n = 1,2,...,∞ builds an orthonormal set. The function
cos(t) cannot be approximated, since all coefficients disappear:
Z 2π
cn = cos(t) sin(nt)dt = 0.
0
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 131
More general, the set of sinusoids cannot approximate any even function.
We thus require a specific property of orthonormal sets, in order to guarantee that every
function can be approximated.
•
∞
X
x(t) = hx(t),pi (t)i pi (t),
i=1
•
n
X
x(t) − hx(t),pi (t)i pi (t)
< ;for all n ≥ N,N < ∞,
i=1
•
∞
X
kx(t)k2 = hx(t),pi (t)i2 (; P arseval),
i=1
• There is no nonzero function f (t) ∈ S for which the set {f (t),p1 (t),p2 (t),...} forms an
orthonormal set.
It is also said: the orthogonal set of basis functions is complete (Ger.: vollständig). Note
that this is not equivalent to a complete Hilbert space (→ Cauchy)!
It is noteworthy to point out the difference to finite dimensional sets. For finite
dimensional sets it is sufficient to show that the functions pi (t) are linearly independent.
If an infinite dimensional set satisfies the properties of Theorem 3.3, then the represen-
tation of x is equivalently obtained by the infinite set of coefficients ci . The coefficients ci
of a complete set are also called generalized Fourier series.
Lemma 3.2 If two functions x(t) and y(t) from S have a generalized Fourier series rep-
resentation using some orthonormal basis set pi (t) in a Hilbert space S, then:
∞
X
hx(t),y(t)i = c i bi .
i=1
132 Signal Processing 1
Proof:
Let:
∞
X ∞
X
x(t) = ci pi (t); y(t) = bk pk (t)
i=1 k=1
Then
*∞ ∞
+
X X
hx(t),y(t)i = ci pi (t), bk pk (t)
i=1 k=1
∞
X
= c l bl .
l=1
which is nothing else but an inner product. The series xk ,yk can be interpreted as the
coefficients bk ,ck , e.g.,
∞
X
ck bk = hX,Y i .
k=−∞
The symmetric form with the prefactor √12π guarantees orthonormality. Other non-
symmetric forms are possible, leading to orthogonal sets. In case the period is not 2π
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 133
Note that in both transformations often orthogonal rather than orthonormal sets are being
applied to reduce complexity in one direction.
Note further that in the previous two examples the orthogonal functions are constructed
by trigonometric functions ejnt/T and ej2πn/N , respectively. The weighting function is thus
w(t) = 1. We simply find:
Z 2π
1 jnt −jmt 0 ; n 6= m
e e dt =
2π 0 1 ;n = m
N −1
1 X j2πkn/N −j2πkm/N 0 ; else
e e =
N 1 ; |n − m| mod N = 0
k=0
∞
X
= δn−m+rN .
r=−∞
The continuous functions are thus truly orthonormal, while the time-discrete series show
some periodic behavior. The orthonormality only applies per period.
Example 3.16 We can now return to a problem that remains from the first chapter when
we discussed linear phase filters, see Section 1.2.3. Let us assume an arbitrary causal FIR
filter response of finite length n is given in terms of its z-transform:
n
X
F (z) = fi z −i .
i=0
134 Signal Processing 1
We thus find the real part to be F (e) (ejΩ ) and the imaginary part F (o) (ejΩ ). The real part
is purely written in terms of cos thus an even function in Ω, the imaginary part an odd
function with sin terms.
As we now understand that sin(.) and cos(.) terms are perfectly orthogonal onto each other,
this means that the real part cannot be approximated by the imaginary part. In that sense
they stand completely independent. As one guarantees an even symmetry, the other an
odd symmetry once can only choose one of them in order to have a linear phase filter. For
even filter responses only the real part in cos(.) terms remains, for odd functions only the
imaginary part in sin(.) terms remains. We can thus conclude that:
Linear phase filters need to have a symmetric impulse response, and symmetric impulse
responses result in a linear phase filter.
for i 6= j, where a particular form of positive weighting function w(t) > 0 is commonly
applied. All of those polynomials share a common property.
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 135
Proof:
Let gn (t)
tp (t) − an pn+1 (t) = bn (t)pn (t) + cn pn−1 (t).
|n {z }
gn (t)
with
di = hgn (t),pi (t)i .
Since the polynomials are orthogonal onto each other, we have
Since gn (t) = tpn (t) − an pn+1 (t) is true, we must also have:
As the coefficients are simply integer values, the computation with these polynomial family
is very easy. In the 19th century when computers were not available, often this polynomial
family was preferred as many computation need to be done manually. The weighting function
becomes visible in the inner product:
∞ t2
e− 2
Z
pn (t)pm (t) √ dt = δn−m .
−∞ n! 2π
| {z }
w(t)
a double recursive form, recursive in time k as well as in order r. We thus require initials
for both recursions:
(r)
x−1 =0 ; r = 0,1,...,N,
(0) N
xk = ; k = 0,1,...,N.
k
Note that the order r really runs only from 0 to N , while k is only for the initials bounded
by N , afterwards k keeps increasing. Applying the Z-Transform with respect to k we obtain:
Plugging in the initials into the previous equation, provides the final explicit solution
10 10
|X(ejΩ )|
|X(ejΩ )|
8 8
i
i
6 6
4 4
2 2
0 0
0 0.5 1 1.5 2 2.5 3 3.5 0 0.5 1 1.5 2 2.5 3 3.5
Ω Ω
Figure 3.11: Binomial Filter bank (left): magnitude of Fourier transform, (right): normal-
ized version.
p0 (t) = 1,
p1 (t) = t,
3 2 1
p2 (t) = t − ,
2 2
5 3 3
p3 (t) = t − t.
2 2
Example 3.20 Tschebyscheff Polynomials
These polynomials have the motivation in designing polynomials that guarantee a limited
maximal error. Designs based on them often have the name equiripple or min max approach.
They include a weighting function:
Z 1
1 0 ; n 6= m
pn (t)pm (t) √ dt = π ;n = m = 0
−1 1 − t2 π
; n = m 6= 0
| {z } 2
w(t)
p0 (t) = 1,
p1 (t) = t,
p2 (t) = 2t2 − 1,
p3 (t) = 4t3 − 3t.
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 139
Figure 3.12 depicts on the left the Legendre and on its right the first members of the
Tschebyscheff family.
dφ(t)
+ φ(t) = 0
dt
with condition
φ(t = 0) = 1.
The solution is well known:
1 1
φ(t) = e−t = 1 − t + t2 − t3 + ...
2 6
Let us solve it with a polynomial. For this we select simple basis functions: pn (t) = tn ; n =
0,1,2,.... We can thus write
X
φ(t) = a0 + a1 t + a2 t2 + ... = an pn (t)
n
with unknown coefficients an . What happens on the basis when we apply a differentiation?
dφ(t)
= a1 + 2a2 t + 3a3 t2 + ...
dt
a0
0 1 0 0 a1
0 0 2 0 a1 = 2a2 .
a2
0 0 0 3 3a3
a3
dφ(t) t
+ φ(t) = 1 +
dt 2
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 141
with the initials φ(t = 0) = 1.5 and φ0 (t = 0) = −0.5. The polynomial on the right hand
side can be written in terms of a vector [1,0.5,0]T and we find
a0
1 1 0 0 1
0 1 2 0 a1 = 0.5 .
a2
0 0 1 3 0
a3
3.6 Wavelets
3.6.1 Pre-Wavelets
Before we consider candidates for wavelets, we reconsider our interpolation function from
Chapter 1.
While the first property displays a desired orthonormality for shifting, the change in
stretching does not deliver an orthogonal basis. If the latter is exploited, a non-diagonal
142 Signal Processing 1
hf (t),pn (t)i
cn = = hf (t),pn (t)i .
hpn (t),pn (t)i
We remember from the Sampling Theorem 1.7 that for band limited functions f (t), the
result is surprisingly simple: cn = f (nT ). This inner product is a convolution integral, yet
not in the form we are most often used to:
Z ∞ Z ∞
f (τ )sinc(2B(t − τ ))dτ = f (τ )sinc(2B(τ − t))dτ = fL (t).
−∞ −∞
We thus obtain a low pass version fL (t) of the original function f (t) as depicted in
Figure 3.13. If we now compare the result of the convolution integral and the original
f(t)
LP
fL(t)
inner product, we recognize that they are identical once we set 2Bt = n. The inner product
n
is thus fL ( 2B ) = fL (nT ) = cn . Only if the function is band limited, the approximated
(filtered) low pass function is identical to the low pass function: fL (t) = f (t) allowing
us to replicate the function without any loss. If this condition is not satisfied, the higher
frequency parts are simply dropped as they do not pass the ideal low pass.
works with zero error only if function f (t) is bandlimited by |ω| < 2πB.
cn = hf (t),p(2Bt − n)i ,
X
f (t) = cn p(2Bt − n).
n
Thus, by proper selecting pn (t) = p(2Bt − n), we can select the space that fits our original
signal best.
j
The pre-factor 2− 2 and the stretch factor 2−j appear as a pair so that with arbitrary
values of j ∈ ZZ, all resulting functions are normalized. Note that, if φ(t) is normalized
(kφ(t)k = 1), then we also have kpj,k (t)k = 1. We select the function φ(t) in such way that
they build for all shifts n an orthonormal basis for a space:
constructed out of two unit step functions. Shifted versions of such pulse span the space
V0 = span {φ(t − n),n ∈ Z
Z} .
With this basis function all functions f0 (t), that are constant for an integer mesh (Ger: im
Raster ganzzahliger Zahlen) can be described exactly. Continuous functions can be approx-
imated with the precision of integer distance.
X
f0 (t) = hf (t),p0,n (t)i p0,n (t) = f (t) − e0 (t).
n
can also be interpreted as piecewise integrated areas over the function f (t). Figure 3.15
displays the quantization of a continuous function f (t) (left) and the corresponding error
term e0 (t) (right).
Figure 3.15: Left: dissection in unit pulses, right: error terms e0 (t).
Stretching can also be used to define new bases for other spaces, for example,
n√ o
V−1 = span 2φ(2t − n),n ∈ ZZ .
then we call φ(t) a scaling function (Ger: Skalierungsfunktion) for a wavelet. Next to
nesting, there are other important properties of Vm .
• Shrinking
∩m∈Z
Z Vm = {}
• Closure
∪m∈Z
Z Vm = L2 (IR).
• Multi-resolution property
With this function we can resemble all functions φ(t), that are constant in a half-integer
(n/2) mesh. All continuous functions can be approximated by a half-integer mesh.
X
f−1 (t) = hf (t),p0,n (t)i p−1,n (t) = f (t) − e−1 (t).
n
The function f−1 (t) thus is an even finer approximation of f0 (t) in V0 . Since V0 is a subset
of V−1 we have:
X
f−1 (t) = c(−1)
n p−1,n (t)
n
X
= c(0)
n p0,n (t) + e0,−1 (t)
n
X X
= c(0)
n p0,n (t) + d(0)
n ψ0,n (t)
n n
with a suitable basis ψ0,n (t) from W0 with W0 ∪ V0 = V−1 . In other words, the set Wj
complements the set Vj in such a way that:
Wj ∪ Vj = Vj−1
with Vj−1 the next finer approximation can be built. Hereby, Wj is the orthogonal comple-
ment of Vj :
Wj = Vj⊥ ⊂ Vj−1 .
This is illustrated in Figure 3.16.
We thus learned that the error terms are in complement spaces to the corresponding signal
spaces. These basis functions ψj,n (t) describing the error terms are called wavelets. It is
a word creation from the French word ondelette (small wave), by Jean Morlet, and Alex
Grossmann, 1980. Thus, we can decompose any function at an arbitrary scaling step into
two components {ψj,n } and {pj,n }. Very roughly, one can be considered a high pass, the
other a low pass. By finer scaling the function can be approximated better and better. The
required number of coefficients is strongly dependent on the wavelet- or the corresponding
scaling function.
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 147
Vj-1
Vj Wj
Figure 3.16: The finer space Vj−1 can be described by the vector space Vj and its orthogonal
complement.
Wavelets were in the research focus in the 80ies and 90ies. In the 80ies parallel sub-
band coding was popular and first there was no obvious reason to favor one of them, as
with the existing wavelets equivalent formulations were possible. This view (at end of the
80ies) however did not reveal the true potential of wavelets as they only offered equivalent
performance. This situation changed when Ingrid Daubechies8 introduced new families of
7
Haar wavelets were introduced in 1909 by the Hungarian Mathematician Alfred Haar (11.10.1885-
16.3.1933).
8
Ingrid Daubechies (17.8.1954) is a Belgian physicist and mathematician.
148 Signal Processing 1
Figure 3.17: Unit pulse p0,0 (t) ∈ V0 and p−1,0 (t) ∈ V−1 as well as wavelet ψ0,0 (t).
wavelets, some of them not having the orthogonality property but a so-called bi-orthogonal
property (see Definition 2.30). These wavelets found application in the JPEG2000 stan-
dards for image coding.
3.7 Exercises
Exercise 3.1 Consider projection matrices
3. Show that the eigenvalues of a projection matrix are either zero ore one.
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 149
f-1(t)
f0(t)
p0,0 (t )
e0,-1(t) ψ 0, 0 (t )
Figure 3.18: Function f−1 (t) ∈ V−1 is approximated by f0 (t) ∈ V0 with error e0,−1 (t).
Chapter 4
Linear Operators
Note that in the mathematical literature the number field (Ger.: Zahlenkörper) can be
arbitrary and needs to be defined. In engineering the number field is typically always real
numbers. This can lead to situations where operators are named linear in the mathematical
literature while in engineering they are not and vice versa, see, e.g., Example 4.6.
l → IR2 .
We thus have a mapping from C
Example 4.2 A quadruple s = [s1 ,s2 ,s3 ,s4 ] ∈ C l 4 is mapped onto a 4 × 4 matrix in C
l 4×4
:
s1 s2 s3 s4
−s∗2 s∗1 −s∗4 s∗3
A[s] = ∗ ∗ ∗
.
s3 s4 s1 s∗2
s4 −s3 s2 −s1
150
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 151
Example 4.3 Let a continuous function g(t) from C[0,1] being sampled at fix time points
0 < t1 < t2 < ... < tn < 1:
Example 4.4 A function f : X → IR(C l ) that maps from a vector space X onto real
(complex) numbers is called a functional. If it is linear, then it is called a linear functional:
1 T
Z
f1 (x) = x(t)dt
T 0
Z b
f2 (x) = x(t)g(τ − t)dt
a
Z ∞
f3 (x) = x(t) exp(−jωt)dt.
−∞
More formally we can define a linear functional as a mapping with a particular property:
• Remark 1: all inner products over functions can be interpreted as linear functionals,
e.g.,
Z b
f2 (x) = x(t)g(τ − t)dt = hx(t),g(t)i .
a
• Remark 2: all continuous, linear functionals in the Hilbert space can be described by
inner products (Riesz’ theorem).
Note that we introduced induced vector norms to describe norms on matrices. We can
also induce norms on linear operators:
kA[x]kp
x
kA[.]kp = kA[.]kp,ind = sup = sup
A
= sup kA[x]kp .
x6=0 kxkp kxkp
kxkp =1
x6=0
Example 4.5 Consider the causal sequence xk ; k = 0,1,2,.... The mapping of the sequence
onto a sum
Xk
sk = xl
l=0
is a linear operator.
152 Signal Processing 1
Example 4.6 Consider the Hermitian1 operator H[A] = AH , that transposes a matrix and
additionally builds the conjugate complex value of all elements. (Ger.: adjungierte Matrix,
Engl.: adjoint matrix):
B1 = AH H
1 = H[A1 ]; B2 = A2 = H[A2 ]
α1 B1 + α2 B2 = α1 AH H H
1 + α2 A2 = (α1 A1 + α2 A2 ) ; α1,2 ∈ IR
Another example is an operator on continuous functions that maps them to even func-
tions by
f (x) + f (−x)
Pe [f (x)] = .
2
Correspondingly there exists also an operator that maps onto odd functions
Proof:
Let us assume A is bounded, then we find that
for all x from X. This however, is equivalent to being continuous. Now starting with
continuity, we find
kA[dx]k = kA[x] − A[x + dx]k ≤ Lkdxk
and we conclude boundedness.
This theorem is true for linear operators, thus also for linear functionals.
Proof:
If xn converges, it must also be bounded, thus
kxn k ≤ M < ∞.
Then we have:
k xn ,y − x,y k = | xn − x,y | ≤ kxn − xkkyk.
Since kxn − xk → 0, then xn ,y converges towards x,y .
With such technique we can also show the following: Let f (x) = hx,g(x)i be a functional,
then f (x) is continuous if g(x) is bounded.
Proof:
| hxn ,gi − hx,gi | = | hxn − x,gi | ≤ kxn − xk kgk .
If kxn − xk → 0, then f (xn ) = hxn ,gi converges towards f (x) = hx,gi.
Theorem 4.2 (Inverse Operator) Let k · k be an operator norm satisfying the submul-
tiplicative property and A[.] : X → X a linear operator with kA[.]k < 1. Then (I − A)−1
exists2 and:
∞
X
−1
(I − A) = Ai ,
i=0
∞
X
A−1 = (I − A)i .
i=0
Proof:
Let kAk < 1. If I − A is singular then there is at least one vector x unequal to 0 so that
(I − A)[x] = 0. Thus we also have x = A[x] and
(I − A)(I + A + A2 + .. + Ak−1 ) = I − Ak .
2
Note that we use the matrix notation Ai to describe i successive applications of the operator A[.].
154 Signal Processing 1
Since
kAk k ≤ kAkk
(by the submultiplicative property) and kAk < 1 it must be true that
lim kAkk = 0.
k→∞
or, equivalently
∞
X
i a a
A (I − A) = .
b b
i=0
We intend to show the latter. Note that for even exponents we find:
0 a a
A =
b b
a
2 a 4
A = b
b 4
a
2k a 22k
A = b ,
b 22k
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 155
Finally,
∞ ∞ a
a
a
X
i a X
i 4 2 a
A (I − A) = A 2
b∗ = 2
b∗ + 2 = .
b b− 2 3 b− 2 3 b∗ − b
2
b
i=0 i=0
Definition 4.4 (Range or Column Space) The vector space, spanned by the columns of
a matrix A = [a1 ,a2 ,..,an ] : X → Y is called its column space or range (Ger.: Spaltenraum
von A):
The second row is the more general form and describes the column space of a linear operator.
156 Signal Processing 1
Definition 4.5 (Row Space) The vector space spanned by the (conjugate complex) rows
of a matrix A = [bT1 ; bT2 ; ...; bTn ] : X → Y , is called row space of A (Ger.: Zeilenraum) or
column space of the adjoint operator A∗ [.]:
R(A∗ [.]) = span ({b∗1 ; b∗2 ; ...; b∗n })
= x ∈ X : A∗ [y] = x for y ∈ Y .
Note that the Hermitian of a matrix is a special form of the adjoint (Ger.: adjungierter)
linear operator: A∗ [x] = AH x.
Definition 4.6 (Adjoint Operator) An adjoint operator for a given operator A[.]
satisfies the following property:
Note that x and y can be of different size and even different type! While the adjoint op-
erator for a matrix is very simple to find (A∗ [·] = AH ), it is in general difficult to determine.
Example 4.8 (Adjoint Operator) Let us consider the following linear operator A[.] that
maps a continuous function x(t) ∈ C[−∞,∞] to a sequence xk ∈ l2 (Z
Z).
Z k+ 1
2
A[x(t)] = x(t)dt = zk .
k− 12
P∞ R k+ 21 R∞
We now recognize that −∞ k− 12
= −∞
and we obtain the same result if we display x̂(t)
as small parts of unit length (T = 1). We thus find
∞
X
x̂(t) = yk rec(t − k)
−∞
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 157
Definition 4.8 (Nullspace) The vector space defined by the solutions A[x] = 0 of a linear
operator A[.] : X → Y is called nullspace N (A) of A (Ger.: Nullraum). It is also called
kernel (Ger.: Kern) of the operator: ker(A[.]).
Definition 4.9 (Left Nullspace) The vector space defined by the solutions A∗ [y] = 0 of
a linear operator A[.] : X → Y is called nullspace N (A∗ ) of A∗ [.] or left nullspace (Ger.:
linker Nullraum), equivalently ker(A∗ [.]).
Now we reformulate the matrix into its adjoint and obtain the row space and the left
nullspace of A:
1 1
H
R(A ) = span 0 , 0
0 1
0
N (AH ) = span 0 .
1
158 Signal Processing 1
The nullspace of the linear operator L (linear functional) consists of all functions x(t) which
convolved with h(t) result in zero. In the Fourier domain these are the functions X(jω)
that have no overlap with H(jω). Thus:
Let us revisit Example 2.33 that we can solve now. We were given a set of equations
Rg = 0.
As the columns in R are linearly dependent, we now understand that the desired solution
must come from the nullspace of R. By properly selecting the number of channel taps (m)
and the number of observations N , we can find unique solutions, that is there is only one
vector that spans the nullspace.
Another important example of nullspaces occurs when solving linear sets of equations.
Let vector b be from the column space of A. Then we have: A linear combination of the
columns of A must be exactly b : Ax = b.
Given Ax = b,
• There is exactly one solution if b is in the column space of A and the columns are
linearly independent.
R(A[.]) ⊂ Y
R(A∗ [.]) ⊂ X
N (A[.]) ⊂ X
N (A∗ [.]) ⊂ Y
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 159
Example 4.11 In this (somewhat larger) example we want to show the importance of un-
derstanding the nullspaces. We consider the application hands free telephone, a common
feature of a handset today, not only used in private conversations but predomiantly in video
conferences and remote team work. It took however, a large amount of research work in the
80ies and 90ies to make that perfectly working what we use today. Consider the setup as
shown in Figure 4.1. The far end speaker signal enters the room via the loudspeaker, is re-
flected in the room and returns into the microphone signal of the local speaker together with
his voice. Both signals are transfered to the far end speaker where his own signal appears
as echo. In case the far end speaker is also using a hands free telephone the loop is closed
and a load sinusoidal sound is audible (Ger.: Rückkopplungspfeifen). An adaptive filter can
estimate the loudspeaker room impulse response and reconstruct the echo signal in order
to subtract it. Important are the very long impulse responses of typically several thousand
taps depending on the room size. In this application the local speaker is the disturbance (for
the adaptive filter estimation) but as it is the signal of interest to be transmitted it requires
special treatment.
ZZ
J
Z J
Z J
Z J
Z
Z J
Z J
Z J
ZJ
~
Z
-^
Z
J
Z
>
@ J Z
@ J Z
J Z
Z
J Z
J Z
J Z
J
Z
with the autocorrelation function rxx (t) of the input signal and the crosscorrelation rxy (τ )
160 Signal Processing 1
vk
yk
h
xk -
? - ? - ek
−
6
ŷk
-
ŵ
describing the correlation between input and output. Such integral can be discretized and
written in form of a linear set of equations:
Rxx w = rxy .
Not knowing the correlation terms, the linear system of equations can be approximated by
observations: ! !
n n
1X 1 X
x xT w = x yk .
n k=1 k k n k=1 k
In order to solve a system of order m(dim(w) = m), n must be at least m. In practise,
often n = 2m and xk must be persistent exciting! Then we find
n
!−1 n
!
1X 1 X
w= x xT x yk .
n k=1 k k n k=1 k
The problems with nullspaces, however, only became an issue once stereo applications
were to be developed. Simply replicating the well-working techniques for single microphone
systems turned out not to work. This was first quite puzzling as a clear explanation was
not found and not obvious. It took a few years and the right formulation of the problem to
understand it better.
For this consider now the problem depicted in Figure 4.3 where the far end speaker signal
(1)
sk is recorded at two microphones and transmitted individually over some channel gk and
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 161
gk(1) gk(1)
s s xk(1)
M F
Figure 4.3: Problem of stereo echo compensation. The left figure explicitly shows the air
transmission from source to microphone, a microphone converting air pressure into electrical
signals and an electrical transmission path F . On the right figure this is abstracted as one
(1)
transmission path with impulse response gk .
we find that this so constructed autocorrelation matrix must be singular! The proof of this
fact is easily performed by constructing a vector uT = α[g (2)T , − g (1)T ]. Applying this vector
u from the right to the autocorrelation matrix , we find that the result is zero. In other words
the vector lies in the nullspace of Rxx . Clearly, solving the stereo problem with the same
techniques wouldn’t work. Eventually, other techniques were developed that circumvented
this problem and made hands free telephony also possible for stereo applications.
Definition 4.10 (Inner Sumspace) Let V and W be linear subspaces, then the space
S = V + W is called inner sumspace consisting of all combinations x = v + w.
Definition 4.11 (Direct Sumspace) Let V and W be linear subspaces. The direct
sumspace is constructed by the pairs (v,w).
Definition 4.12 (Orthogonal Subspace) Let S be a vector space and V and W both
subspaces of S. V and W are called orthogonal subspaces if for each pair v from V and w
from W we have: hv,wi = 0.
V ⊥ = W.
V ∪ V ⊥ = {(0,0,0),(1,0,0),(0,1,0),(0,0,1),(0,1,1)},
obviously not the entire vector space V . If on the other hand we build the linear hull with
these elements, we find span(V ∪ V ⊥ ) = S, a result we also obtain by the inner sumspace:
V + V ⊥ = S. If, however, we build the direct sumspace, we obtain something entirely
different:
⊥ (0,0,0,0,0,0),(1,0,0,0,0,0),(0,0,0,0,1,0),(1,0,0,0,1,0),
V ⊕V = .
(0,0,0,0,0,1),(1,0,0,0,0,1),(0,0,0,0,1,1),(1,0,0,0,1,1)
Example 4.13 Let be S =GF(2)3 . The vectors v = (1,0,0) and w = (0,0,1) are from S.
They span the subspaces V and W : V = span(v) = {(0,0,0),(1,0,0)} and W = span(w) =
{(0,0,0),(0,0,1)}. Both spaces are orthogonal subspaces. The subspace V has the orthogonal
complement :
V ⊥ = {(0,0,0),(0,1,0),(0,0,1),(0,1,1)} .
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 163
V ⊥ = span ({(0,1,0),(0,0,1)}) ,
or any pair that does not include the zero vector. As in the previous example, merging V
with its orthogonal complement does not make for the entire vector space S. For this to
obtain we need to take the inner sumspace of the two.
Note: let be v from V and w from W . Assume that V and W are orthogonal comple-
ments in S. Then we do not necessarily have:
V ∪ V ⊥ = S,
span V ∪ V ⊥ = S,
V + V ⊥ = S,
although this may appear intuitively. Typically for such properties we need complete
spaces (Cauchy series!).
We can also now return to the concept of projections as we understand them even better.
We already mentioned that there are different kind of projections.
Definition 4.14 In an orthogonal projection its range and its nullspace are orthogonal
subspaces, in oblique (non-orthogonal) projections, this is not the case.
Note that the eigenvalues of a projection matrix are either zero or one.
Theorem 4.3 Let be V and W two subspaces of a vector space S (not necessarily a com-
plete one) with inner product. Then we have:
164 Signal Processing 1
1. V ⊥ is a complete subspace of S.
2. V ⊂ V ⊥⊥ .
3. If V ⊂ W then we have: W ⊥ ⊂ V ⊥ .
4. V ⊥⊥⊥ = V ⊥
5. If x ∈ V V ⊥ then x = {∅}.
T
Theorem 4.4 Let be A : X → Y be a bounded linear operator between two Hilbert spaces
X and Y and R(A[.]) as well as R(A∗ [.]) complete subspaces. Then we have:
N (A[.]) = R(A∗ [.])⊥ , N (A∗ [.]) = R(A[.])⊥ ,
R(A[.]) = N (A[.]∗ )⊥ , R(A∗ [.]) = N (A[.])⊥ .
Proof: Let us have a look on the first property. Let x ∈ N (A[.]). But then also
< A[x],y >=< x, A∗ [y] >= 0 holds for all y. This means that x is orthogonal to A∗ [.], and
thus x ∈ R(A∗ [.])⊥ . Then, since x ∈ N (A[.]), also N (A[.]) ⊆ R(A∗ [.])⊥ must hold.
If the argumentation is started with x ∈ R(A∗ [])⊥ , it can be concluded that x ∈ N (A[.]),
and thus R(A∗ [.])⊥ ⊆ N (A). Accordingly, the only possibility is that N (A) = R(A∗ [.])⊥ .
Particular for matrices: the equation Ax = b has (at least) one solution if and only if
H
b v = 0 for each vector v for which AH v = 0.
Proof:
Let A[x] = b and v ∈ N (A∗ [.]). Then hb,vi = hA[x],vi = hx,A∗ [v]i = hx,0i = 0. Consider
now that hb,vi = 0 if v from N (A∗ [.]), but A[x] = b has no solution. Since b is not from
R(A[.]), we assume that b = br + b0 , with br from R(A[.]) and let b0 be orthogonal to the
vectors from R(A[.]). Thus, we have hA[x],b0 i = 0 for all x and thus A∗ [b0 ] = 0. Moreover,
if b0 is in N (A∗ [.]) and
Example 4.15 Let us consider the following example. We want to solve the set of linear
equations
1 4 5
Ax = 2 5 x = 7 .
3 6 9
We analyze matrix A and find
1 4
R(A) = span 2 , 5
3 6
and
−1
N (AH ) = span 2 .
−1
As b ∈ R(A) we conclude that there must be at least one solution. As the nullspace contains a
vector different from the zero vector, we conclude that there are infinite amounts of solution,
i.e., the solution is non unique. According to Fredholm’s theorem, we can simply test this
by computing the inner product of the vector from the left null space and the right hand side
b and we find that it to be zero. We thus must have at least one solution.
166 Signal Processing 1
Theorem 4.6 The solution of A[x] = b is unique if and only if the unique solution of
A[x] = 0 is x = 0, thus N (A[.]) = {∅}.
Proof
Let us assume the solution x of A[x] = b is not unique. Then a second solution must exist,
say x + ∆x for which we obtain the same right hand side, i.e.,
A[x + ∆x] = b.
Due to the linearity of operators, we have A[x + ∆x] = A[x] + A[∆x]. As A[x] = b, we
conclude that A[∆x] = 0. Thus, ∆x must come from the nullspace of operator A. If the
nullspace contains only the zero vector, this is indeed true. If the nullspace would contain
other vectors, it would not hold.
Definition 4.15 (Rank) The rank (Ger.: Rang) of an operator A[.] is defined by the
dimension of its column space (row space), thus the number of linearly independent columns
(or rows).
We find
1 2 T 0
R(A) = span , ; N (A ) = span .
5 5 0
and
1 5 −4
R(AT ) = span 4 , 20 ; N (A) = span 1 .
2 5 0
The rank of A is r = 2 as there is two linearly independent vectors spanning the range and
column space of the matrix.
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 167
N (B) ⊂ N (AB)
R(AB) ⊂ R(A)
N (AH ) ⊂ N ((AB)H )
R((AB)H ) ⊂ R(B H ).
Proof, part 1:
If B[x] = 0, then A[B[x]] = 0. Thus, every x from N (B) is also in N (AB).
rank(A[B[.]]) ≤ rank(A[.]),
rank(A[B[.]]) ≤ rank(B[.]).
Example 4.18 Let us reconsider LS solutions for sets of linear equations. In a first ex-
ample we assume an overdetermined system Ax = b, i.e.,
a11 ... a1N b1
... x1
a21
.. = b2
.
.. . ..
. .
xN
aM 1 ... aM N bM
AH Ax = AH b.
in which a small, positive is introduced. Its value is chosen so that the system of equation
can be solved numerically. Note however, with such regularization, the problem has been
168 Signal Processing 1
Let us now do the same for the corresponding underdetermined problem, i.e., N > M .
The underdetermined LS solution is given by
If rank(AAH ) = M , the solution is the well-known minimum norm solution. If, on the
other hand, rank(AAH ) < M , the inverse cannot be computed. Again, regularization of
AAH may help in this case.
• LU: stands for lower- and upper-triangular. A = LU can be solved easier since
LU x = b : U x = c and Lc = b, i.e., two linear systems of equations, that are easy to
solve.
In the following after some matrix basics, we treat the eigenvalue decomposition first
and we will then show the singular valued decomposition.
Let A be an m × m matrix from C l . Consider the linear equation
Au = λu
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 169
or equivalently
(A − λI)u = 0.
Here the trivial solution u = 0 is not of interest but the nullspace of A − λI. Partic-
ular values λ, generating non-trivial nullspaces, are called eigenvalues. The corresponding
vectors u, are called eigenvectors.
Example 4.19 Let a linear, time invariant system be described by the following equation
in state space:
z k+1 = Az k + Bxk
y k = Cz k
= H(q)[xk ].
We find the equivalent linear operator to be H(q) = C(q)(qI − A)−1 B(q)[xk ]. In case
of matrices A,B,C we can write H(q) = C(qI − A)−1 B. Since the matrix inversion of
(qI−A) determines the dynamic and stability behavior of the system, so does its determinant
det(λI − A).
Lemma 4.3 If the eigenvalues of a matrix A are all different then the corresponding eigen-
vectors are all linearly independent.
Proof:
We start with m = 2 and the opposite: Let us assume the eigenvectors u1 and u2 are
linearly dependent.
c1 u1 + c2 u2 = 0
c1 Au1 + c2 Au2 = c1 λ1 u1 + c2 λ2 u2
−| c1 λ2 u1 + c2 λ2 u2 = 0
c1 (λ1 − λ2 )u1 = 0.
Since λ1 and λ2 are different and u1 is not the zero vector, we must have c1 = 0. A similar
argument leads to c2 = 0. This proves the two eigenvectors are linearly independent. For
m > 2 we consider always the case that two vectors are linearly dependent and prove the
contradiction.
170 Signal Processing 1
Example 4.20 If the eigenvalues are not different, the eigenvectors can be linearly depen-
dent or not. Consider the following matrices:
4 0 4 1
A= ; B=
0 4 0 4
Both matrices have the same eigenvalues λ1 = λ2 = 4. The eigenvectors of matrix A are
linearly independent, those of B not.
Example 4.21 Check whether the following theorem holds: ‘A square n × n matrix A has
linearly independent columns if and only if all of its eigenvalues are non-zero.’
Remark: Two matrices are called similar if they have the same eigenvalues. A matrix
transformation that does not change the eigenvalues is called a similarity transformation
(Ger: Ähnlichkeitstransformation), see also Section 1.1.4.
If the eigenvalues are not all linearly independent, a diagonalisation is not possible.
However, a close to diagonal form is possible, the so-called Jordan form: A = T JT −1 . Here
matrix J is of blockdiagonal form with blocks Ji :
λi 1
λi 1
Ji = . (4.1)
. .
. 1
λi
We find
3 1 0
J(B) = 0 3 0 ,
0 0 3
and we need to complement T to
1 0 0
T = 0 1 1 .
0 1 0
By adding a linearly independent vector, we can guarantee T to be regular again and thus
its inverse exists. Note that as before we have B m = T J m T −1 but now J m does not preserve
its particular form, thus is not necessarily diagonal or of Jordan form.
Theorem 4.8 (Cayley Hamilton) Each square matrix satisfies its own characteristic
equation.
Proof:
Not too difficult. Try it yourself.
Note that the order of the polynomial depends on the size of the Jordan block.
Proof:
hAu,ui = λ hu,ui
= u,AH u = hu,Aui
= λ∗ hu,ui .
Thus, λ = λ∗ .
Proof:
Let λ1 and λ2 be two different eigenvalues with corresponding eigenvectors u1 and u2 . Then
we have:
= λ2 hu1 , u2 i = λ1 hu1 , u2 i .
Proof:
Consider an arbitrary matrix S that diagonalizes a Hermitian matrix A: S −1 AS = Λ. Due
to the Hermitian property we have AH = A but the eigenvalues remain real: ΛH = Λ:
H
S −1 ΛS = S H AS −1H = Λ = S −1 AS.
1
We thus conclude that the square root of B is given by A = U Λ 2 U H , which exists as long
as none of the eigenvalues is negative. Those matrices B are positive semi-definite. As A
itself is Hermitian, we also find that B = A2 = AAH = AH A = A2H .
However, consider now an arbitrary rectangular shaped matrix à that is not Hermitian.
If we construct B = ÃÃH , we obtain a positive semi-definite Hermitian matrix B = B H .
If we now compute the square roots of B, we obtain different square shaped matrices:B =
AAH , say: A 6= Ã.
Example 4.24 In some applications where the square roots are of interest some additional
parts are added on the main diagonal. If white noise is involved a scaled identity matrix is
added, similarly if regularization is applied for numerical reasons:
1 1
A + δI = U (Λ + δI)U H = U (Λ + δI) 2 U H U (Λ + δI) 2 U H .
We can thus immediately observe the influence of δ > 0 on the square roots.
A puzzling result is obtained when we consider the following Hermitian positive semi-definite
matrix: B = 11T . We want to factorize:
B + γ 2 I = AAH .
We find the two eigenvalues of B to be zero and two. The corresponding eigenvectors are:
1 1 1
U=√ , .
2 −1 1
We can thus write
2 2 + γ2
B+γ I =U UH.
γ2
This can now be factorized into
T T 2
γ2
1 1 1 1 γ
A= 1+ + .
1 1 2 −1 −1 2
The following property of linear operators is also very surprising. In general if an input
vector is applied to a matrix, all its columns are linearly combined. Only if zero entries
are in the input vector, some columns are left out. With some vectors, however, this is
different. When applying a linear combination of eigenvectors, a new linear combination of
such eigenvectors results. In other words, once we confine the input to be in a span of a
subset of eigenvectors, we stay in that subspace. The subspace is thus invariant.4
Example 4.25 Let an n × n matrix A have k (smaller than n) different eigenvalues with
the corresponding eigenvectors ui , i = 1,2,...,n. Let U = [u1 , u2 ,..., un ] and Ui ; i = 1,2,...,k,
be the k subsets of eigenvectors, corresponding to the k eigenvalues λi , i = 1,2,...,k. The
subspaces span(Ui ) spanned by the subsets Ui are invariant subspaces of A. For example,
consider a 6 × 6 matrix with
Consider now space S, spanned by the two eigenvectors u2 ,u3 , S = span ({u2 ,u3 }). We call
S an invariant subspace for A. Any vector build by the two eigenvectors: x = αu2 + βu3
results in a vector with the same property:
In other words, operating on the linear operator A results in staying in the same subspace.
The vector never leaves such subspace. Correspondingly, T = span ({u4 ,u5 ,u6 }) is also an
invariant subspace.
• identity: I = ki=1 Pi ,
P
• with: Pi = uj ∈Ui uj uH
P
j .
The matrices Pi are projection matrices in the (invariant) subspace span(Ui ), spanned by
the normalized eigenvectors uj .
Proof:
We already know that Hermitian matrices can be diagonalized by unitary matrices, thus:
n
X
H
A = U ΛU = ui λi uH
i
i=1
n
X k
X
= λi ui uH
i = λi Pi .
i=1 i=1
176 Signal Processing 1
Note we have:
n
X k
X
ui uH
i = UU H
=I= Pi .
i=1 i=1
A = λ1 u1 uH H H
+ λ3 u4 uH H H
1 + λ2 u2 u2 + u3 u3 4 + u5 u5 + u6 u6 .
Here we identify u1 uH1 = P1 the first projection. Note that eigenvectors are normalized
thus u1 u1 = u1 u1 /(uH
H H
1 u1 ) which reveals the projection operator. Analogously, we identify
u2 u2 + u3 q u = P2 and u4 uH
H H H H
4 + u5 u5 + u6 u6 = P3 .
Such spectral decomposition provides us with an insight what a Hermitian matrix does,
when applied to an input vector x:
k
X k
X X
Ax = λi Pi x = λi uj uH x.
i=1 i=1
|{z}
uj ∈Ui
| {zj}
stretching projectx
A Hermitian operator thus first projects the input onto its individual components (sub-
spaces) and then stretches (or compresses if λ < 1) the components. Finally all parts are
added together again.
with the two eigenvalues λ1 = 5 and λ2 = 10. It has the corresponding eigenvectors:
1 1 1 −2
u1 = √ ; u2 = √ .
5 2 5 1
Example 4.27 Consider a weakly stationary random process x with Hermitian autocorre-
lation matrix Rxx . The diagonalization of Rxx leads to:
Rxx = E xxH = U ΛU H .
We can linearly filter now such random process and obtain y = U H x. The autocorrelation
matrix Ryy of this new random process is given by
We thus obtain a perfectly decorrelated random process. The eigenvalues can be interpreted
as the individual energy terms in such random process. If considering the eigenvalues one
realizes that some can be extremely small, thus do not have much part of the ACF matrix.
They could be neglected, approximating the process. In some applications such as image
processing the top ten eigenvalues may contain 99 percent of the energy. Such an approxi-
mation based on a few strong eigenvalues is called: Karhunen-Loeve description of random
processes x.
By selecting the various eigenvectors for x = un , we obtain a single eigenvalue λn . For the
eigenvector umax associated to the largest eigenvalue λmax we obtain the largest eigenvalue
λmax and so on. The two extremes are thus:
xH Ax xH Ax
max = λmax ; min = λmin .
x xH x x xH x
Definition 4.23 (Rayleigh Quotient) The expression
xH Ax
r(A,x) =
xH x
is called Rayleigh quotient.
Note that such expression only makes sense when applied to Hermitian matrices. We find
the following important property:
vk
xk yk
+ h
Figure 4.4: Eigenfilter transmission: knowing the statistic of xk , what is the optimal impulse
response hk to maximize the SNR?
Example 4.28 The matched filter (Ger.: Signalangepasstes Filter) is well known to max-
imize the signal to noise ratio of deterministic signals. If, however, the maximal signal
to noise ratio of random signals is considered, we speak of an eigenfilter. We consider
Figure 4.4 in which a random signal xk is additively corrupted by noise vk . We like to
design a filter hk so that the Signal-to-Noise Ratio (SNR) at its output is maximized. The
idea is thus to separate the signal from the noise as much as possible. We formulate the
convolution by inner vector products:
yk = hT xk + hT vk .
This allows to consider noise and signal components individually. The signal power at the
output of the filter is
P = hT E[xxH ]h∗ = hH Rxx h.
On the other hand, the noise power is simply given by: N = hT E[vvH ]h∗ = σv2 hH h. If we
want to maximize the SNR we have to compute
The desired optimal solution can thus simply be found on the eigenvector h = umax that is
associated to the largest eigenvalue λmax .
Let us revisit Example 2.33 that we can solve now under the more realistic aspect of
(1)
additive noise in the transmissions. For this we assume that the received vectors rk and
(2) (i) (i)
rk are composed of the original vectors r̃k and noise terms v k :
(i) (i) (i)
rk = r̃k + v k ; i = 1,2.
The noise terms are filtered by the channel estimates g (1) and g (2) and appear in sum at
the output with σv2 (kg (1) k22 + kg (2) k22 ) as noise energy. If there is multiple solutions possible
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 179
we would certainly be interested in those that have the smallest noise influence that is the
smallest norm (minimum norm solution). Thus, rather than requiring the outcome of Rg to
be the zero vector, we want now kRgk22 to be as small as possible. As every scaled version
of g is acceptable, we can set the additional requirement on g to be of unit norm and obtain
a Rayleigh quotient problem:
g H RH Rg
g opt = arg min .
g kgk22
We thus have to find the smallest eigenvalue of RH R and select its eigenvector as solution.
Example 4.29 This technique can be used for filter design. Let us design a linear phase
filter with 2N +1 coefficients for which we are given the magnitude (Ger.: Amplitudengang)
Hd (ejΩ ):
N
X
−jN Ω −jN Ω
jΩ
H(e ) = e jΩ
HR (e ) = e cos(nΩ) = e−jN Ω bT c(Ω),
n=0
where we forced a linear phase filter due to symmetry constraints and we filled the one sides
filter impulse response (scaled by a factor of two) in vector b. A low pass filter is to design
with limit frequencies Ωp < Ωs so that
jΩ 1 ; 0 ≤ Ω ≤ Ωp
Hd (e = .
0 ; Ωs ≤ Ω ≤ π
Note that this formulation is much more flexible than the previous ones as we do not have
a single fixed frequency where the filter behavior changes rapidly from pass- to stopband
(Ger.: Sperrbereich) but instead we can have a large range between Ωp and Ωs where we
do not care about the filter slope. We find the signal energy in the stopband to:
Z π Z π
2 1 T
c(Ω)cT (Ω)dΩb = bT P b,
jΩ jΩ
Es = Hr (e ) − |Hs (e )| dΩ = b
Ωs π Ωs
where we introduced a matrix Q similar to P for the stopband. The entire filter problem
is thus given by:
J(α) = αEs + (1 − α)Ep = αbT P b + (1 − α)bT Qb = bT Rb
for 0 ≤ α ≤ 1 with the abbreviation R = αP + (1 − α)Q. Dependent on the chosen value of
α we can put more emphasis on the stop or the passband. Obviously, there is still freedom
in the choice of b. We can restrict this by normalizing in the form of bT b = 1. The filter
problem thus reads now:
min bT Rb ; with constraint bT b = 1,
or, equivalently
bT Rb
min = λmin (R).
b bT b
The problem can thus be solved by applying the Rayleigh quotient.
1
Now the problem is suitable for solving it by a Rayleigh quotient. We find y opt = B − 2 h1 ,
resulting in xopt = B −1 h1 = (H T H)−1 h1 . The corresponding maximal SLR is then
4. The additive noise modeling observation and modeling errors. It is a random process
which we assume to be zero-mean complex-valued Gaussian. Its statistic is described
2
by a sole parameter, its variance σw which we hope to measure.
What are the applications of such model?
We rarely identify frequencies directly but note that the Fourier transform of the expression
gives: F [h(t − t)] = H(jω)ejωt . Thus, all calculations of temporal changes or delays are
equivalent to the determination of frequencies. This is for example being used in radar
techniques. Equivalently shock waves in the Earth can be measured by this and different
layers of material be identified. This is then naturally a method to detect large amounts of
hidden fluids in rocks, such as petrol. With antenna arrays the information captured can
182 Signal Processing 1
also be brought into AoA (Angle of Arrival) and AoD (Angle of Departure) computation
in wireless fields and can be used to detect the number of scatterers and reflections.
Depending on the wavelength this can lead to map of the scanned area. But discriminating
different angels of arrival also allow for spying techniques, that is in a superposition of
many sources, a single one can be picked and filtered out, thus become visible or audible.
The random amplitudes as well as the additive noise makes x(t) a random process.
Sampling such process equidistantly with time period T over M > p positions, we obtain a
vector x with
p
X
x= ai si + w.
i=1
Here, the noise is analogously sampled in vector w and we introduced a new deterministic
vector T
si = 1,ej2πfi T ,ej2πfi 2T ,...ej2πfi (M −1)T .
Rxx = E xxH ,
which in practise can be obtained by observing several of theses vectors x at different times
and computing a mean expression for the ACF matrix rather than an ensemble. With our
vectors si we can write the ACF matrix as
p
X
E |ai |2 si sH
Rxx = i + Rww .
i=1
Rxx = SP S H + Rww ,
where we introduced matrix S = [s1 ,s2 ,...,sp ], a concatenation of all vectors si as well as a
diagonal matrix P that contains all the signal powers E [|ai |2 ]. Note that matrix S is of
Vandermonde form. We can relate the problem in form of a LS problem:
x = Sa + w
aLS = min kx − Sak22 .
a|S,p
But now that S and p are not given, the problem at hand is not simply an LS problem.
If there were no noise, the matrix would only be spanned by the p signal components si
and thus p
X
Rxx|w=0 = λi ui uH
i ,
i=1
with slightly different eigenvalues but the same eigenvectors. The additive noise compo-
2 2
nent adds σw to all eigenvalues uniformly, as we assume white Gaussian noise: λ̃i = λi +σw .
As long as the noise is zero, we thus have only p eigenvalues different from zero. This
part of the space is spanned by the vectors si but equivalently by the first p eigenvectors
ui :
span s1 ,s2 ,...,sp = span u1 ,u2 ,...,up = span {US } ,
meaning that although we do not know the individual vectors si (unless it is a single one,
p = 1), we do know the space spanned by them. We call this the signal space. The
remaining part is called the noise space:
N = span up+1 ,up+2 ,...,uM = span {UN } .
x = US b + w
bLS = min kx − US bk22 .
b|US ,p
1. Finding p: in the classical PHD method, the ACF matrix is computed by (temporal)
averaging of instantaneous vector products xxH and its decomposition in eigenvectors
U and eigenvalues Λ̃ is computed. Ordering the eigenvalues form largest to smallest,
2
typically tells at which amount the “noise floor” starts, that is where only σw is
visible. Assuming that the largest values are discriminable against the noise related
values, p is determined.
2
2. Finding σw : once p is determined, a good estimate of the noise variance is simply
given by averaging over all “noisy” eigenvalues:
M
2 1 X
σ̂w = λ̃i .
M − p i=p+1
184 Signal Processing 1
3. Finding S: defining the signal space is a rater difficult nonlinear problem. Pisarenko
proposes to use the noise subspace N defined by the eigenvectors in UN . They must
be orthogonal to those in the signal subspace, in other words:
UNH S = 0,
or more individually for each signal vector si ; i = 1,2,...,p and each noise eigenvector
uk ; k = p + 1,p + 2,...,M we have
uH
k sk = 0.
In other words, we now have given a polynomial of order M − 1 with the coefficients
uM,m and only need to find its roots. We recognize here several (numerical) problems:
(i) solving a polynomial of high dimension M can become very challenging, (ii) the
obtained result my not lie exactly on the unit circle, and if M > p + 1, (iii) depending
which eigenvector uk ; k = p + 1,...,M we take, we may get different results.
4. Finding a: Pisarenko suffices to compute the energy of the signal components, that
is E[|ai |2 ]. For this we relate to the description
Rxx = SP S H + Rww = SP S H + σw
2
I,
for which we only look into the first row. There we find the autocorrelation function
rxx (0),rxx (1),...,rxx (M − 1). Each of these elements can be computed by taking the
first row of the left S (which is only ones) and taking the m−th column of the right
S: p
X
rxx (m) = exp(−2jπT mfk )E[|ai |2 ] + δm σw
2
; m = 0,1,...,M − 1.
k=1
2
The term δm σw is only for the term rxx (0) as it sees the impact of noise, the remaining
ones are free of it. But note that we only need p values. We thus pick only the first
p autocorrelation values rxx (1),...,rxx (p). We recognize a set of linear equations in
E[|ai |2 ]:
exp(j2πf1 ) exp(j2πf2 ) exp(j2πf3 ) . . . exp(j2πfp ) E[|a1 |2 ] rxx (1)
exp(j2π2f1 ) exp(j2π2f2 ) exp(j2π2f3 ) . . . exp(j2π2fp ) E[|a2 |2 ] rxx (2)
exp(j2π3f1 ) exp(j2π3f2 ) exp(j2π3f3 ) . . . exp(j2π3fp ) E[|a3 |2 ] rxx (3)
= .
.. .. ..
. . .
2
exp(j2πpf1 ) exp(j2πpf2 ) exp(j2πpf3 ) . . . exp(j2πpfp ) E[|ap | ] rxx (p)
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 185
The matrix on the left hand side is a Vandermonde matrix and thus invertible. The
set of equations can be solved.
Rather than looking for zeros of a polynomial, we now identify outstandingly high values,
which are much easier to detect. The problem now reduces in identifying the maxima of
P (f ) with a desired precision.
2
After estimating p and σw we compute the noise free terms SP S H as well as SP ΦH S H and
consider
SP [I − λi ΦH ]S H u = 0; i = 1,2,...,p.
This is a generalized eigenvalue problem: SP S H u = λi SP ΦH S H u with generalized eigen-
values λi . Such eigenvalues are the desired values exp(j2πfi ).5
Theorem 4.10 (Singular Value Decomposition) Every matrix A from C l m×n can be
decomposed in the following form: A = U ΣV H with the unitary matrices U from C l m×m
and V from Cl n×n as well as a matrix Σ ∈ IRm×n
+0 with a diagonal block Σ+ from IRr×r
+0 with
r ≤ min(m,n).
The concept behind the theorem is that if we take any arbitrary matrix A, we can form
two positive semi-definite Hermitian matrices, that is
B1 = AH A; AH AV = V Λ1
B2 = AAH ; AAH U = U Λ2 .
Now the unitary matrices U and V as well as the diagonal matrices Λ1 and Λ2 with the
eigenvalues must be related.
Proof:
The proof relates to the eigenvalue decomposition of Hermitian matrices. We thus start
with considering AH A with A ∈ C l n×n rather than A. Let the eigenvalues of AH A be
ordered from largest λ1 to smallest, so that λ1 ≤ λ2 ≤ ... ≤ λr > 0 and λr+1 = ...λn = 0.
The eigenvalues fill a diagonal matrix Λ1 . The Hermitian AH A can be diagonalized by the
unitary matrix V : V AH AV H = Λ1 . The first r vectors can thus be constructed by:
Av
ui = √ i ; i = 1,2,...,r.
λi
The vectors v i are the columns form V . We find that ui ,uj = δi−j , for i,j = 1,2,...,r. The
vectors ui build the columns of a matrix U1 . The set {u1 ,..., ur } from U1 can be extended
5
There is numerical solutions available for solving generalized eigenvalue problems. In Matlab use
eig(A,B) rather than eig(A).
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 187
with orthogonal vectors (for example by the Gram Schmidt method). We thus obtain
U = [u1 ,...,ur ,...,un ] = [U1 ,U2 ] : U H U = I.
l m×m :
Obviously, the vectors in U are eigenvectors for AAH from C
v v
AAH ui = AAH A √ i = Aλi √ i = λi ui .
λi λi
This is clear for the eigenvalues that are distinct from zero. For the zero eigenvalues
the corresponding eigenvectors must come from the nullspace of AAH . Since we have
for Hermitian matrices that R(AAH ) is the orthogonal complement to N ((AAH )H ) =
N (AAH ), all eigenvectors are orthogonal.
We therefore find for U AV H :
• i = 1,2,...,r:
1 H H p
uH
i Av j = √ v i A Av j = λi δi−j .
λi
• i = r + 1,r + 2,...,m:
AAH ui = 0 → λi = 0.
Since z i = AH ui is in the nullspace of A (Az i = 0) and in the range of AH . For
AH ui = 0 we also have v H H H
j A ui = uj Av i = 0.
p
Thus, U H AV = Σ contains a diagonal block Σ+ with the non-zero elements λj , j = 1..r .
Note that r denotes the number of non-zero eigenvalues. If r = min(m,n), all potential
eigenvalues are non-zero and the matrix is of full rank. However, if this is not the case
then we have r < min(m,n).
Example 4.31 : Let the matrix have the two singular values σ1 and σ2 . If it is an m×n =
2 × 3 matrix, we find:
σ1 0 0
Σ= .
0 σ2 0
If, on the other hand, it is an m × n = 3 × 2 matrix:
σ1 0
Σ = 0 σ2 .
0 0
188 Signal Processing 1
B1 V = V Λ1 = V ΣT Σ.
B2 U = U Λ2 = U ΣΣT .
Thus Λ2 = ΣΣT contains the eigenvalues on its diagonal. Both matrices B1 and B2 have
the same singular values and thus the same non-zero eigenvalues. Continuing the previous
example leads for a 2 × 3 matrix to
2
σ1 0 0 2
T 2 T σ1 0
Σ Σ = 0 σ2 0 ; and ΣΣ =
;
0 σ22
0 0 0
Note further that if A is from IR then all matrices (U,S,V ) are from IR. We can explicitly
formulate the spectral decomposition of a matrix A ∈ C l m×n , p = max(m,n) and obtain:
p r
H V1H X X
A = U ΣV = [U1 U2 ]Σ = σi ui v H = σi ui v H
i .
V2H i
i=1 i=1
We recognize a split in [U1 ,U2 ] and [V1 ,V2 ] in which {U1 ,V1 } are associated to the singular
values, and {U2 ,V2 } to the zero blocks. Accordingly, the first summation also takes into
account terms that are zero, specifically σr+1 = σr+2 = ... = σp = 0. In the second sum
they are simply left out as they do not contribute. We can equivalently write
r
X
A= σi ui v H H
i = U1 ΣV1 .
i=1
This can lead to two moderately small matrices Ũ1 ,Ṽ1 that fully describe the larger matrix
A. It is thus a form of data compression.
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 189
Example 4.33 Recall the induced matrix norm: kAk2,ind from Section 2.3.2:
Note that the rank of a matrix A is given by the number of (nonzero) singular values.
l m : b = Ax}
R(A) = {b ∈ C
l m : b = U ΣV H x
= b∈C
l m : b = U Σy
= b∈C
l m : b = [U1 U2 ] Σ+ y 1
= b∈C
l m : b = U1 z}
= {b ∈ C
= span(U1 ).
R(A) = span(U1 ),
N (A) = span(V2 ),
R(AH ) = span(V1 ),
N (AH ) = span(U2 ).
Let us revisit now the LS problem with overdetermined parameters. We thus have more
observations m than parameters n:
We recognize that in this equivalent form only the first n observations are taken into account,
the rest is simply discarded. The LS method thus searches in the reduced observation
space Cl n the solution with smallest norm. Now the solution of this is obtained by the
pseudoinverse of Σ:
σ1
...
Σ+
Σ = = ,
σr O
0 0 0
1
σ1
0
Σ# = .. −1
0 = Σ+ O .
.
1
σr
0
We find as LS solution:
x̃ = Σ# b̃
V H x = Σ# U H b
x = V Σ# U H b.
(AH A)−1 AH = V Σ# U H .
How does this relate to the underdetermined solution? Let us consider now m < n,
l m, x ∈ C
b∈C l n . The previous formulation now reads:
x̃1
...
b̃1
σ1 0
. x̃m
b̃2
Σx̃ =
. . 0 = .. .
x̃m+1 .
σm 0 ..
. b̃m
x̃n
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 191
In this case components x of the parameter space are eliminated. Let us thus consider the
solution of the underdetermined LS solution:
x = AH (AAH )−1 b
−1
= V ΣT U H U ΣV H V ΣT U H b
−1
= V ΣT U H U ΣΣT U H b
= V ΣT (ΣΣT )−1 U H b.
We recognize now the term ΣT (ΣΣT )−1 = Σ# the pseudinverse of the singular value matrix.
The underdetermined LS method also finds a minimum norm solution, however now in a
reduced parameter space.
P1 H
s P2 T x y B ŝ
Figure 4.5: MIMO transmission: knowing the channel H, how do we design prefilter and
receiver filter to obtain maximum capacity?
192 Signal Processing 1
where SNR denotes the transmit SNR at unit transmit power. Here a degree of freedom is
the precoding defined by a covariance matrix R. The question thus is how to design such
matrix R with the power constraint tr(R) = K, i.e., the overall transmit power is limited
to a maximum power K. The solution is found due to the application of SVD on channel
matrix H:
H = U ΣV H .
Selecting R = V P V H with a diagonal matrix P , we can reformulate the problem into its
equivalent representation:
r
Y SNR 2
c = Prmax log2 1+ Pi σ i .
i=1 Pi =K
i=1
NT
The value r refers to the rank of H, or equivalently the number of singular values that are
non-zero. Further straight forward calculation finally leads to:
r
X SNR 2
c = Prmax log2 1 + Pi σ i .
i=1 Pi =K
i=1
NT
We can now interpret the MIMO transmission as transmitting over r individual channels,
each of them has its own SNR, given by SNRPi σi2 /NT . The optimal distribution of the
individual power shares is given by the well-known waterfilling solution (Shannon 1948).
W = W W T W.
U ΣV T = U ΣV T (U ΣV T )T U ΣV T = U ΣΣT ΣV T .
As the left and right hand matrices are unchanged now, the diagonal part needs to be
considered. Mathematically we obtain problems of the form σ = σ 3 for which three solutions
are possible: {−1,0,1}. Since we deal with singular values that cannot be negative per
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 193
definition, only two solutions remain. The solution of the problem is thus given by all
permutations of zeros and ones on the diagonal of Σ and the two unitary matrices U and V .
Let the step-size µ > 0 and W0 = X be an initial value for the algorithm, assuming
X ∈ IRm×n with rank r = n.
We can solve these questions by applying SVD on the left and right-hand side of the
algorithm:
We recognize that we can drop the unitary matrices U and V as they have no influence in
the convergence of the algorithm and obtain the much simpler form:
Obviously zero is a fixed-point. The only other potential fixed-point is given by one as we
can show by:
1 − σi,k+1 = 1 − σi,k − µσi,k (1 − σi,k )(1 + σi,k ) = (1 − σi,k )(1 − µσi,k (1 + σi,k )).
We can thus conclude that as long as |(1 − µσi,k (1 + σi,k ))| < 1 the algorithm will converge.
The solution of the iterative algorithm is thus W∞ = U I+ V T , where I+ stands for a matrix
with zeros or ones on its main diagonal. If due to the initial condition X all components
T
are excited, then no zeros occur and W∞ W∞ = I.
blind source separation where superpositions of signals are recorded and then separated
again. Take for example a cocktail party with many guests (say 6). If you record there
speeches with several microphones (say 8) that are distributed in the room, chances are
high to obtain all individual speeches by all six persons.
But the algorithm can also be used for other difficult numerical tasks as it is very suit-
able for a fixed-point processor due to a processing that only contains add/mult operations.
Reformulate the algorithm, for example, to
Wk+1 = Wk + µWk (I − WkT R2 Wk ).
T 2
The algorithm will converge to W∞ R W∞ = I and thus a square matrix W equals the
inverse R−1 U , with some unitary matrix U . Starting with W0 = R, for example, results in
the desired modal space and we obtain W∞ = R−1 . Similarly the root of a matrix can be
computed by
Wk+1 = Wk + µWk (I − WkT R−1 Wk ).
Due to the cubic part, the algorithm typically performs very quickly. Once the estimate is
close to the solution, very few more iterations are required. But if the initial point is very
far from the solution, many iterations can be necessary. It is thus important to use as much
a-priori information as possible to have a good starting point.
Example 4.36 (Coordinated Multipoint Problem 2) Let us reconsider Exam-
ple 4.30, a coordinated multi-point problem. Different to the previous example, we now use
several antennas (N ), possibly even from several base-stations (thus the word coordinated)
to serve K < N users with single antennas. We look at the first user and find his receive
signal (without noise)
r1 = hT1 x
where h1 denotes the channel vector of the N antennas to this user and x is the precoding
vector we try to find. The channels h2 ,h3 ,...,hK of the remaining users 2,3,...,K we stack
in a matrix, say H ∈ RN ×(K−1) . We now try to minimize the leakage to the other users
assuming that if no power leaks, it will arrive at the desired user:
kHxk22 khT1 xk22 xT h1 hT1 x
SLR = min = max = max .
x khT1 xk22 x kHxk22 x xT H T Hx
different to problems relating the Rayleigh quotient, we have now take into account that
xopt ∈ N (H T H) such that the ratio becomes infinite. If the nullspace is sufficiently large,
an infinite set of solution sis possible. Nevertheless, not all solutions are of the same quality,
as some may increase the received signal strength more than others. A constructive way is
to identify the nullspace of H T H and then find the linear combination of the nullspace that
maximizes the receive power. We thus apply SVD on H T H:
H T H = U ΣU T = [U1 ,U2 ]Σ[U1 ,U2 ]T ,
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 195
where U2 spans the nullspace. That is any solution of the nullspace is feasible. We can
formulate the problem now as:
xT h1 hT1 x y T U2T h1 hT1 U2 y
max T T => max
x x H Hx y yT y
This problem we can easily solve with the Rayleigh quotient. The so obtained y opt = U2T h1
needs to be resubstituted to xopt = U2 y opt = U2 U2T h1 . We can interpret this solution as
the optimal solution h1 projected onto the nullspace U2 , as U2 U2T = U2 [U2T U2 ]−1 U2T is a
projection operator. The so obtained maximal SLR is given by:
(hT1 U2 U2T h1 )2
SLR = → ∞.
0
If compared to the first problem in Example4.30 we recognize that the term (H T H)−1 is now
replaced by U2 U2T .
Alternatively, in such case, a better metric may be the Signal-to-Noise and Leakage-Ratio
(SNLR):
xT h1 hT1 x xT h1 hT1 x
SNLR = max T T = max T T .
x x H Hx + σv2 x x [H H + σv2 I]x
Now the ratio cannot become infinite and all vectors from the nullspace obtain an equal
contribution to the denominator.
Example 4.37 (Condition Numbers) Let us consider a linear set of equations in which
after the SVD has been applied one singular value is very small, say , while the others are
reasonably large:
σmax
..
Σ= .
.
σn−1
In practise it can be wise to replace such by zero and continue the computation with
a lower rank n − 1. Methods that are replacing very small singular values by zero are
called rank reducing methods. Before we can apply such a rank reducing method, however,
we should have a metric that quantifies the matrix and tells us how small the really is
when compared to the remaining values. A common metric for such quality is the so-called
condition number, defined by
κ(A) = kAk22,ind kA−1 k22,ind .
We recall that the 2-induced matrix norm picks out the largest singular value σmax of matrix
A. Correspondingly
1 kA−1 xk2 1
= max = . (4.2)
σmin x kxk2 minx kAxk22
kxk
196 Signal Processing 1
Therefore the condition number is nothing else but the ratio of largest and smallest singular
value
σmax
κ(A) = .
σmin
Consequently, condition numbers are nonnegative numbers whose minimum is one.
In case matrix A is Hermitian, we particularly find also a relation given by the eigenvalues
of A:
λmax (A)
κ(A) = .
λmin (A)
For non-Hermitian matrices A, an alternative form in terms of eigenvalues is possible
s
λmax (AH A)
κ(A) = .
λmin (AH A)
The condition number relates directly to the numerical quality of a solution in a linear set
of equations. For this we consider the distorted problem
A(x + ∆x) = b + ∆b,
in which a distortion in ∆x causes a corresponding distortion in the right hand side. The
condition number κ(A) now relates by how much a distortion on one side relates to the
distortion on the other side:
k∆xk k∆bk
≤ κ(A) .
kxk kbk
4.7 Exercises
Exercise 4.1 Consider Theorem 4.3. Proof the remaining parts in an analogue way.
Exercise 4.2 Consider Theorem 4.7. Proof the remaining parts in an analogue way.
Exercise 4.3 Consider Theorem 4.8. Proof it.
Exercise 4.4 Consider skew-symmetric matrices, i.e.
AH = −A.
1. Show that all eigenvalues are zero or purely imaginary.
2. Derive a projection for arbitrary square matrices to skew-symmetric matrices and to
Hermitian matrices.
Exercise 4.5 Show with SVD techniques that Equation (4.2) is correct.
Exercise 4.6 Show that for the Frobenius norm the following is true:
kABkF ≤ kAkF kBk2,ind
kABkF ≤ kBkF kAk2,ind .
Chapter 5
Matrix Operations
5.1 Motivation
Many problems can be formulated as the solution of a large linear system of equations.
Although in principle mathematical methods are known, once the systems become very
large (say 10.000 equations), they become difficult and tedious to solve. Two sources of
difficulties occur:
1. Numerical problems: due to rounding errors in floating point (fix-point solutions
for large matrices are typically infeasible due to numerical problems). A method
to describe the severity of such numerical problems is the condition number, see
Example 4.37.
2. Complexity problems: with increasing number of equations also the complexity grows,
it thus takes longer and longer until the result is computed.
Both problems are present in many general systems of linear equations and are difficult to
solve. If, however, the systems are in some form structured, additional information may be
possible to deduce from them simply by their structure and by this reduce the problems at
hand. Take for example the following block structured square matrix
A 0
C= ,
0 B
comprising of two smaller square matrices A,B ∈ IRN ×N . If we know the eigenvalues αi ,βi
and corresponding eigenvectors ai and bi of A and B, respectively, then we can also say
something about C:
A 0 ai ai
= αi
0 B 0 0
A 0 0 0
= βi ; i = 1,2,...,N.
0 B bi bi
197
198 Signal Processing 1
But even more complicated constructs can be investigated. Take for example now:
A B
C=
B A
We can show that the eigenvalues of C may occur in pairs. If we know one eigenvalue λi
with its corresponding eigenvector, we also know the other:
A B xi xi
= λi
B A yi yi
A B yi yi
= λi ; i = 1,2,...,N.
B A xi xi
Thus, as long as xi 6= y i and xi 6= −y i , the eigenvalues occur in pairs, otherwise not. Try
yourself on the skewed block matrix1
A B
C= .
−B A
It turns out that half of the eigenvalues of C are twice those of A and the other half is zero.
To understand such behavior better, we will consider Tensor operations next.
Example 5.1
1 2 7 7 7
A = ;B =
3 4 9 9 9
7 7 7 14 14 14
9 9 9 18 18 18
A⊗B =
21 21 21
28 28 28
27 27 27 36 36 36
7 14 7 14 7 14
21 28 21 28 21 28
B⊗A =
9 18 9
18 9 18
27 36 27 36 27 36
A ⊗ B 6= B ⊗ A,
i.e., in general commutativity does not hold. The previous example shows however,
that the elements of both tensor products are identical, they only occur at different
positions. We can thus fix this missing property by a so called permutation matrix,
that reorders the elements.
2. Commutativity by Permutations: Consider a permutation matrix P that is com-
posed of many products of elementary permutation matrices
in which each elementary permutation is defined by its index pair (ik ,jk ):
1
..
.
1
P ik jk = 0 1 .
1 0
. ..
1
with the exchange on the ik −th and jk −th column and row elements. Note that
such a permutation matrix is unitary: P T P = I. With help of suitable permutation
matrices the desired order of the elements can be assured and we can obtain
B ⊗ A = P T (A ⊗ B)P.
200 Signal Processing 1
3. Distributivity: We find:
(A + B) ⊗ C = (A ⊗ C) + (B ⊗ C),
A ⊗ (B + C) = (A ⊗ B) + (A ⊗ C).
4. Associativity: We find:
(A ⊗ B) ⊗ C = A ⊗ (B ⊗ C).
5. Transpose: We find:
(A ⊗ B)T = AT ⊗ B T ,
(A ⊗ B)H = AH ⊗ B H .
6. Trace: We find:
tr(A ⊗ B) = tr(A)tr(B).
13. Pseudo Inverse: The pseudo inverse of A ⊗ B is given by the Kronecker product of
the pseudo inverses.
#
(A ⊗ B)# = UA⊗B ΣA⊗B VA⊗B H
= (VA ⊗ VB )(Σ# # H H
A ⊗ ΣB )(UA ⊗ UB )
= (VA Σ# H # H # #
A UA ) ⊗ (VB ΣB UB ) = A ⊗ B .
As shown in the context of SVD in the previous chapter, we find for the two pseudo
inverses:
(X H X)−1 X H = V Σ# U H
X H (XX H )−1 = V Σ# U H ,
for which we set X = A ⊗ B. Obviously the position of the unitary matrices U and V
remain unchanged: UA⊗B = UA ⊗ UB and VA⊗B = VA ⊗ VB . Only the singular value
matrix ΣA⊗B undergoes some transformation. Let us take a closer look to this (for
the first pseudo inverse):
−1
Σ#A⊗B = ΣT
A⊗B ΣA⊗B ΣA⊗B
−1
= (ΣA ⊗ ΣB )T (ΣA ⊗ ΣB )
(ΣA ⊗ ΣB )
−1
= (ΣA ΣA ) ⊗ (ΣTB ΣB )
T
(ΣA ⊗ ΣB )
−1 −1
T T
= (ΣA ΣA ) ⊗ (ΣB ΣB ) (ΣA ⊗ ΣB )
= (ΣTA ΣA )−1 ΣA ⊗ (ΣTB ΣB )−1 ΣB = Σ# #
A ⊗ ΣB .
Proofs: Let us prove the first property. Select one entry of the matrix (or block),
for example aij , and show that the desired property is satisfied; then generalize. We find
a11 B 6= b11 A. All listed properties can be shown straightforwardly.
Proof:
Let Aai = αi ai and Bbj = βj bj . Then simply multiplying delivers:
The eigenvalues of A are given by (6, − 1) and those of B by (10,4). If we compute the
Kronecker product, we find:
28 −12 35 −15
−12 28 −15 35
A⊗B =
14 −6
,
7 −3
−6 14 −3 7
(A ⊕ B) = (A ⊗ In ) + (Im ⊗ B).
(A ⊕ B) = (In ⊗ A) + (B ⊗ Im ),
(A ⊕ B) = (In ⊗ A) + (B T ⊗ Im ).
Proof:
Let Aai = αi ai and Bbj = βj bj . Then simply multiplying delivers:
√
The eigenvalues of A are given by 52 ± 21 33, those of B by (0,0,27). The eigenvalues of
1 2 7 7 7 8 7 7 2
1 2 9 9 9
9 10 9
2
1 2 11 11 11
= 11 11 12
2
(A⊕B) = 3 +
4 7 7 7 3
11 7 7
3 4 9 9 9 3 9 13 9
3 4 11 11 11 3 11 11 15
√ √ √
are thus given by (2.5 ± 12 33,2.5 ± 12 33,29.5 ± 21 33).
Definition 5.3 (Vec Operator) Let the matrix A be of dimension m × n. The vec oper-
ator rearranges the elements of a matrix column-wise into a vector:
a1
a
2
A = [a1 ,a2 ,...,an ]; vec[A] = .. .
.
an
Here some important properties of the vec-operator:
1. Trace: We find:
tr(AB) = vec[AT ]T vec[B],
tr(AB) = vec[AH ]H vec[B].
Let us consider a further example. Given a matrix A ∈ IRm×m and a matrix B ∈ IRn×n
as well as two rectangularly shaped matrices X,C ∈ IRn×m . We like to solve:
XAT + BX = C.
In XAT + BXIm = C
We recognize a Kronecker sum. We thus can solve for X, if the pairwise sum of all
eigenvalues of A and B is non-zero.
Whether a large dimensional matrix is separable into smaller ones is not easily visible.
Definition 5.4 A matrix A is called separable (Ger.: separierbar) if there are two matrices
A1 ,A2 for which we have:
A = A1 ⊗ A2 .
Example 5.6 Separability allows for saving complexity. Consider for this the problem of
finding the inverse of a matrix A, for example, for solving a right hand side problem of the
form Ax = b. Computing the inverse of a general matrix of size m × m is of cubic order,
i.e., O(m3 ). Take for example
2 3 4 6
−5 6 −10 12 1 2 2 3
A= = ⊗ .
10 15 14 21 5 7 −5 6
−25 30 −35 42
min kF − B ⊗ Ak22
A,B
that is
tr(AH Fmn )
bmn = = vec[A]H vec[Fmn ],
tr(AH A)
due to the norm constraint on A.
We now have to minimize
p1 p2
Fkl − tr(AH Fkl )A
2
X X
min F
A
l=1 k=1
the solution of which is given by the eigenvector associated to the largest eigenvalue:
p1 p2
X X
vec[A] = arg maxeigvec vec[Fkl ]vec[Fkl ]H .
l=1 k=1
While the decomposition works well in case matrix A is separable, hat would happen if
not? In case of an arbitrary matrix A, the decomposition would extract that separable
part of A that a LS solution allows. After subtracting it from the original matrix A, the
remaining part describes the LS error term, orthogonal onto the separated part. The error
can thus be used to evaluate how well a matrix is separable or not.
F1 FN
v3
v1 v2
Figure 5.1: Example of a 3-way tensor.
r r
X X √ √
A= σi ui v H
i = σi ui σi v H H
i = Ũ1 Ṽ1 .
i=1 i=1
into products Low rank means data compression. This is equivalent to the requirement:
1
kAk∗ = min kŨ k2F + kṼ k2F ; s.th. A = Ũ Ṽ H .
2
rank
X(A)
kAk∗ = σr ; nuclear norm.
r=1
This norm provides something similar to sparsity but for matrices.3 The rank is thus the
minimum number of outer vector products, required to approximate a given matrix A with
zero error. Matrix A is a special case of a tensor, a so-called two-way tensor.
Now let us consider an extension of the matrix concept, the n-way tensor. Figure 5.1
depicts a 3-way tensor.
Example 5.7 Consider for example, a video stream, i.e., a sequence of matrices Fk ∈
IRl1 ×l3 of dimension l1 × l2 × l3 (being a frame) over time k = 1,2,...,l2 . We thus have a
three dimensional matrix (3-way tensor) F . Such a 3-way tensor F can be approximated
by:
rank
X (F )
min
F − v 1,i ⊗ v 2,i ⊗ v 3,i
.
v 1,i ,v 2,i ,v 3,i
i=1
2
It is known that for such a tensor, its rank is bounded by
min(l1 ,l2 ,l3 ) ≤ rank(F ) ≤ min(l1 l2 ,l2 l3 ,l1 l3 ).
To find the rank of a tensor is in general NP-hard!
We can add arbitrary many dimensions to this construct. With every additional dimen-
sion, its data amount increases substantially. If a certain pattern is of interest, the time
to search through the entire tensor would be unfeasibly long. Apply first a separation into
vectors and/or matrices, then apply a search algorithm. Design decomposition based on a
fraction of data and running online without storing the entire data first.
Applications of the Hadamard transform can be found in source coding for speech, image
and video processing as well as mobile communications such as the UMTS wireless standard.
Often normalized Hadamard matrices are used. Applying √1n as normalization we ob-
tain unitary (orthogonal) matrices. The advantage of such structured matrices is that a
significant amount of operations can be saved (separable). Is typically n2 operations per
vector multiplication of an n × n matrix required, so can this complexity be reduced to
n log2 (n) operations. This is often called the Fast Hadamard Transform.
The operation can be applied straightforwardly with 12 operations and identically with only
8 = 4 log2 (4) operations due to structuring. We set:
a a+b
z 1 = H2 = ,
b a−b
c c+d
z 2 = H2 = ,
d c−d
z1 + z2
z = .
z1 − z2
Figure 5.2 depicts the signal flow graph for this example.
210 Signal Processing 1
Theorem 5.4 Let Hn with n = 2m being a Hadamard matrix built by Sylvester construc-
tion, then we have:
Proof:
The proof goes by induction. We obtain the result immediately for m = 1. Assume the
result is true for m, is it also true for m + 1? For 0 < i < m + 1 we should have:
(i)
M2m+1 = I2i−1 ⊗ H2 ⊗ I2m+1−i = I2i−1 ⊗ H2 ⊗ I2m−i ⊗ I2
(i)
= M2m ⊗ I2 .
Thus we have:
(1) (2) (m+1) (1) (2) (m)
M2m+1 M2m+1 ...M2m+1 = (M2m ⊗ I2 )(M2m ⊗ I2 )...(M2m ⊗ I2 )(I2m ⊗ H2 )
(1) (2) (m)
= (M2m M2m ...M2m ⊗ I2 )(I2m ⊗ H2 )
(1) (2) (m)
= (M2m M2m ...M2m I2m )(I2 H2 )
= H2m ⊗ H2 = H2m+1 .
(1) (2)
Example 5.10 Consider H4 = M4 M4 . Figure 5.3 depicts the two matrices and that
only half their elements are non-zero.
With the HT we can also operate with other symbols than {0,1} or {−1,1}.
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 211
Definition 5.6 The Discrete Fourier Transform describes an operation, mapping a discrete
time sequence into a discrete Fourier sequence.
N
X −1
kl
yl = wN xk ; l = 0,1,...,N − 1.
k=0
Example 5.12 Consider a vector with the entries x = [a,b,c,d,e]T . By a DFT this vector
can be mapped into the Fourier domain:
1 1 1 1 1 1 1 1 1 1 1 1
1 w61 w62 w63 w64 w65 1 w61 w62 w63 w64 w65
1 w62 w64 w66 w68 w610 1 w62 w64 w60 w62 w64
y= 3 6 9 12
15 x = x = F6 x.
1 w 6 w 6 w 6 w6 w 6
1
w63 w60 w63 w60 w63
4 8 12 16 20
1 w6 w6 w6 w6 w6 1 w64 w62 w60 w64 w62
5 10 15 20 25
1 w6 w6 w6 w6 w6 1 w65 w64 w63 w62 w61
2. Periodicity
xk+N = xk , yl+N = yl .
3. Linearity
DFT[αxk + βuk ] = αDFT[xk ] + βDFT[uk ].
4. Circular Symmetry
Consider a sequence xk ; k = 0,1,...,N − 1 of length N . Take the first n0 terms away
and append them at the end.
N −1+n
X0 N
X −1 N −1+n
X0
kl kl kl
xk w N = x k wN + xk w N ,
k=n0 k=n0 k=N
N
X −1 N −1+n
X0
kl kl
= xk w N + xk+N wN ,
k=n0 k=N
N
X −1 0 −1
nX N
X −1
kl kl kl
= xk w N + xk w N = xk w N = yl .
k=n0 k=0 k=0
5. Circular shift
In contrast to the previous property we shift the sequence now by n0 values circularly:
N
X −1 N
X −1
kl l(k−n0 +n0 )
xk−n0 wN = xk−n0 wN ,
k=0 k=0
N
X −1
ln0 l(k−n0 ) ln0
= wN xk−n0 wN = wN yl .
k=0
product wl = yl zl = DFT[vk ]:
N
X −1
vm = xk um−k ; m = 0,1,...,N − 1
k=0
v0 u0 u−1 = uN −1 . . . u−N +1 = u1
v1 u1 u0 . . . u−N +2 = u2
v = = x
.. .. ..
. . .
vN −1 uN −1 uN −2 . . . u0
x0 x−1 . . . x−N +1
x1 x0 . . . x−N +2
= u.
.. ...
.
xN −1 xN −2 . . . x0
Example 5.13 Consider the convolution with a short channel {h0 ,h1 ,h2 }
xk
yk h0 h1 h2
xk−1
yk−1 = h0 h1 h2 xk−2 ,
yk−2 h0 h1 h2 xk−3
xk−4
yk h0 h1 h2 xk
yk−1
h0 h1 h2 xk−1
yk−2 =
h0 h1 h2 xk−2
.
ỹk−3 h2 h0 h1 xk−3
ỹk−4 h1 h2 h0 xk−4
The first equation depicts the true transmission scenario, i.e., a convolution. In the
second equation the channel matrix is augmented to be a circulant matrix. Due to this
circulant matrix an FFT can be applied to offer a low complexity solution. There is
two options: 1) to simply throw the additional terms {ỹk−3 ,ỹk−4 } away (Overlap Add
method) or 2) to save them for the continuation of the sequence xk (Overlap Save
method).
7. Backwards sequence
8. Parseval’s Theorem
Let DFT[xk ] = yl and DFT[uk ] = zl :
N −1 N −1
X 1 X ∗
xk u∗k = yl zl .
k=0
N l=0
Question: Can the DFT similarly to the FHT be described so that a fast version with
less complexity is possible? Answer in Example 5.12. Consider two matrices
1 1 1
1 1
F2 = ; F3 = 1 w62 w64
1 w63
1 w64 w68
Now compute
1 1 1 1 1 1
1 w62 w64 w60 w62 w64
1 w64 w62 w60 w64 w62
F2 ⊗ F3 = 6= F6 .
1 w60 w60 w63 w63 w63
1 w62 w64 w63 w65 w61
1 w64 w62 w63 w61 w65
Including input and output vector we find:
y0 1 1 1 1 1 1 x0 y0 1 1 1 1 1 1 x0
y4 1 w62 w64 w60 w62 w64 x2 y1 1 w61 w62 w63 w64 w65 x1
y2 1 w64 w62 w60 w64 w62 x4 y2 1 w62 w64 w60 w62 w64 x2
= ; = .
y3 1 w60 w60 w63 w63 w63 x3 y3 1 w63 w60 w63 w60 w63 x3
y1 1 w62 w64 w63 w65 w61 x5 y4 1 w64 w62 w60 w64 w62 x4
y5 1 w64 w62 w63 w61 w65 x1 y5 1 w65 w64 w63 w62 w61 x5
Obviously, we did not receive the correct DFT matrix. However, it is a DFT, just with
the wrong order of input and output elements. Obviously, the method is sufficient to
construct a fast variant of the DFT, the so called Fast Fourier Transform (FFT). The
corresponding signal flow graph is shown in Figure 5.4 where also a complexity comparison is
given with the direct approach, i.e., multiplication with the F6 matrix. Given the dimension
N = NL NL−1 ...N2 N1 of the matrices and assuming that all Nk are prime, we find the N -
dimensional FFT as:
F = FNL ⊗ FNL−1 ⊗ ... ⊗ FN2 ⊗ FN1 .
The drawback is that the ordering is different to the DFT. However, by suitable permutation
matrices for in- and output Px and Py this can be solved without increasing complexity.
FN = Py (FNL ⊗ FNL−1 ⊗ ... ⊗ FN2 ⊗ FN1 )Px .
Note that also the classical FFT (by Cooley and Tukey) requires a reordering of the entries!
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 215
y0
F2
x0 y3
x2 F3
x4
y4
F2
x3 y1
x5
F3 y2
x1
F2 y5
F2 ⊗ F3
Complexity: 2x(3x2) + 3x(2x1) = 18 <52=25
Definition 5.7 A (finite) m × n or infinite matrix is called a Toeplitz matrix if its entries
aij only depend on the distance i − j.
h0 ... hL−1
.. ..
.
. h0 hL−1
Tmn = .. .
h1−L . . . ..
. .
h1−L ... h0
rL−1 aL−1 vL−1
h0 h1 h2
rL−2 aL−2 vL−2
h0 h1 h2
rL−3 =
.. .. ..
aL−3 +
vL−3 = Ha + v.
.. . . . .. ..
. . .
h0 h1 h2
r2 a0 v2
5
Otto Toeplitz (1881-1940) was a German mathematician.
Univ.-Prof. DI. Dr.-Ing. Markus Rupp 217
Thus, the system of equations exhibits square Toeplitz structure and it can be solved by
a low complexity method (O(n2 ): Levinson-Durbin algorithm). We further conclude that
LS as a method is not necessarily destroying the Toeplitz structure. However, if the order
of the channel L is large, the problem can become numerically challenging.
218 Signal Processing 1
On the left hand we have a Toeplitz form while on the right hand side we have a circulant
matrix.
Definition 5.8 (Circulant Matrix) A circulant n × n matrix Cn is of the form
h0 h1 h2
h0 h1 h2
... ... ...
Cn =
h0 h1 h2
h2 h0 h1
h1 h2 h0
where each row is obtained by cyclically shifting to the right the previous row.
Note that Toeplitz as well as circulant matrices are commutative, i.e.:
GH = HG.
Note that the two matrices above only differ by three elements. This difference remains
constant even if the matrices keep growing. Thus, the larger the matrices become the more
similar they become.
Note that the trace of the matrices is equivalent to their sum of eigenvalues, thus
n
1 1X
lim tr Aln = lim (λn,k (An ))l .
n→∞ n n→∞ n
k=1
Theorem 5.5 All circulant matrices of the same size have the same eigenvectors. The
eigenvectors are given by the DFT matrix of the corresponding dimension n,i.e.,
FnH Cn Fn = Λ. (5.1)
Cn = Fn ΛFnH
is a circulant matrix.
Thus, different to general matrices, the knowledge of the size n is sufficient to know
all eigenvectors. With this knowledge the eigenvalues can be computed. Even more, the
computation of the eigenvalues in Equation (5.1) is not of cubic complexity but requires
only two FFTs, thus O(n log(n)).
The problem of the equalizer is to find a linear filter G[r] so that a reappears. As the entries
x do not matter we can generate them for example by
rL−1 aL−1
rL−2 h0 h1 h2 aL−2
vL−1
rL−3 h0 h1 h2
aL−3 vL−2
..
. . . . . . . . .
.. vL−3
. = . + .
h 0 h 1 h2 ..
r2 a0 .
x
h 0 h1 h 2
aL−1
v2
h0 h1 h2
x aL−2
We recognize now that a small part of the data is appearing twice, at the beginning and
at the end of the transmission. If the part from the end is replicated at the beginning, we
call it cyclic prefix, otherwise cyclic postfix. By this we have now an equivalent form of
transmission that makes the channel appearing cyclic although physically it is not. At the
receiver end we can decode simply by applying FFT matrices Fn
r = Cn a + v
Fn r = Fn C n a + Fn v
= Fn Cn FnH Fn a + Fn v.
| {z }
Λn
We recognize a linear distortion by the diagonal matrix Λn . Assuming that all eigenvalues
are non-zero such distortion can be compensated by its inverse:
Λ−1 −1
n Fn r = Fn a + Λ n Fn v
FnH Λ−1 H −1
n F n r = a + F n Λn F n v
a ≈ FnH Λ−1 F r.
| {zn n}
Gn
FFT operations require n log(n) operations each and the inverse Λ−1
n is of order n divisions.
Note that we do not claim this to be the optimal decoder. As the noise is changed by Λ−1 n
and we do not take this into account, there is better ones. Nevertheless, this is the basic
method of current WiFi and LTE transmission methods.
Theorem 5.6 The eigenvalues of a circulant matrix of size n × n are given by its Fourier
transform evaluated at the frequencies Ωk = 2πk
n
; k = 0,1,...,n − 1.
Proof: Let us denote the Fourier transform of the circulant set c0 ,c1 ,...,cn−1 by
n−1
X
jΩ
ck e−jkΩ .
c̃ e =
k=0
2π
with the twiddling factor wn = e−j n . Comparing this with its identity
λ0 λ1 . . . λn−1
λ0 n−1
w λ
n 1 . . . wn λn−1
λ0
F n Λn = wn λ1 . . . wnn−2 λn−1
2
,
..
.
λ0 wnn−1 λ1 . . . wn λn−1
kTn xk2
lim kTn k2,ind = lim sup = sup |H(ejΩ )| (5.2)
n→∞ n→∞ x kxk2 Ω
valid for Toeplitz matrices. We connected there a statement on matrices with classical
Fourier Transform terms. We simply assumed at the time that the worst case vector, i.e.,
the vector that maximizes the ratio above, is given by a harmonic excitation (2.3). We now
understand that this assumption was indeed correct as with growing size n the Toeplitz
matrix becomes asymptotically equivalent to a circulant matrix. The eigenvector for that
is indeed a harmonic excitation with a single frequency.
A further means for connecting Toeplitz matrix properties with those of the Fourier
Transform are given in form of the following theorem.
Theorem 5.7 (Szegö) Given a continuous function g(x), the eigenvalues λn,k ; k =
0,1,...,n − 1 of a square Toeplitz matrix Tn with elements {h0 ,h1 ,...,hL−1 } asymptotically
satisfy the following condition:
n−1 Z π
1X 1
lim g(λn,k ) = g(H(ejΩ ))dΩ
n→∞ n 2π −π
k=0
PL−1
where H(ejΩ ) = k=0 hk e−jΩk .
Once this holds, a continuous function g(x) can be constructed by a linear combination
of polynomials. For Equation (5.3) to hold we recall that the eigenvalues of a circulant
matrix are identical to the corresponding Fourier transform at equidistant frequency points
Ωk = 2π
n
k. Thus
n−1 n
1X l 1 X l −j 2π k
λ = H e n .
n k=0 n,k n k=1
With growing n the sum can be replaced by an integral over the range 2π and we are done.
As such terms typically appear in the capacity of multiple antenna transmission systems
(see Example 4.34), the theorem is helpful to compute the capacity bounds.
Example 5.17 Consider Shannon capacity for OFDM transmission over channel H (in
frequency domain) where we have to compute
HRH H
max ln det I + .
tr(R)≤P σv2
Often the simpler expression
HH H
ln det I + ,
σv2
known as Mutual Information, suffices. For linear time-invariant systems, matrix H is of
Toeplitz form. For very large matrices Hn we can now compute such expression to be
Hn HnH
lim ln det I +
n→∞ σv2
as we know that the Toeplitz matrices are asymptotically equivalent to circulant matrices:
Hn → Cn . The Hermitian of a circulant matrix remains a circulant matrix, leaving us
with a circulant matrix Cn CnH instead. The expression I + Cn CnH /σv2 can thus also be
interpreted as a circulant matrix and we can compute the desired value in two ways:
1 n−1
Hn HnH n
1X
ln λn,k (I + Cn CnH /σv2 )
lim ln det I + 2
= lim
n→∞ σv n→∞ n
k=0
where the eigenvalues λn,k of I + Cn CnH /σv2 can quickly be computed by an FFT.
Alternatively, we can compute the desired expression also by
1 Z π
Hn HnH n |H(ejΩ )|2
1
lim ln det I + = ln 1 + dΩ.
n→∞ σv2 2π −π σv2
Bibliography
[5] Moon, Stirling, Mathematical Methods and Algorithms for Signal Processing, Prentice
Hall.
224