Notes Analysis
Notes Analysis
NOTES
1
EXAMPLES:
2 0
A= : χA (x) = (x − 2)2 , mA (x) = (x − 2)
0 2
2 1
B= : χB (x) = (x − 2)2 , mB (x) = (x − 2)2
0 2
If A and B are square matrices, A ⊕ B is defined to be the square matrix
A 0
A⊕B =
0 B
(2) the minimal polynomial mC (λ) is the least common multiple of the minimal
polynomials mA (λ) and mB (λ), that is, is the monic polynomial mC (λ) of lowest
degree such that mA (λ)|mC (λ) and mB (λ)|mC (λ).
Proof: The first result follows from the observation that
det (λI − A) ⊕ (λI − B) = det(λI − A) · det(λI − B).
V = X1 ⊕ X2 ⊕ · · · ⊕ Xm .
For example, if e1 , . . . , en is a basis for V and L(ei ) is the set of all scalar
multiples of the vector ei (the span of ei , a one-dimensional subspace of V ), then
V = L(e1 ) ⊕ · · · L(en ). The necessary and sufficient condition that V = X ⊕ Y
for a finite-dimensional vector space V is that dim V = dim X + dim Y and that
X ∩ Y = 0.
If T : V −→ V is a linear mapping from a vector space V to itself, a subspace
W ⊂ V is T -invariant if T W ⊂ W . For example, the kernel and image of T are
T -invariant; and any eigenspace E(λ) of the mapping T is T -invariant.
2
Suppose that T : V −→ V is a linear mapping from a vector space V to itself
and that V = X ⊕ Y where X and Y are T -invariant subspaces of V . Since
X is T -invariant, the restriction of T to the subspace X is a well defined linear
mapping T |X : X −→ X, and similarly for the restriction T |Y . If {e1 , . . . , em }
is a basis for X, in terms of which the transformation T |X : X −→ X is
represented by a matrix A, and if {f1 , . . . , fn } is a basis for Y , in terms of
which the transformation T |Y : Y −→ Y is represented by a matrix B, then
{e1 , . . . , em , f1 , . . . , fn } is a basis for V and in terms of this basis the matrix
describing the linear transformation T is A ⊕ B. Conversely for the linear
transformation T defined by a matrix A ⊕ B, where A is an m × m matrix and
B is an n × n matrix, the subspaces X spanned by the basis vectors e1 , . . . , em
and Y spanned by the basis vectors em+1 , . . . , em+n are invariant subspaces, on
which the action of T is represented by the matrices A and B, and V = X ⊕ Y .
Here ei is the vector with entry 1 in the i-th place and entries 0 elsewhere, one
of the standard basis vectors for the vector space Rm+n .
If T : V −→ V is a linear transformation from a vector space V to itself,
a vector v ∈ V is said to be a cyclic vector for T if the vectors v, T v, T 2 v, . . .
span the vector space V . If v is a cyclic vector for a transformation T on a
finite-dimensional vector space V , there is an integer n such that the vectors
v, T v, . . . , T n−1 v are linearly independent but the vectors v, T v, . . . , T n v are
linearly dependent; consequently there is an identity of the form
a0 v + a1 T v + · · · + an−1 T n−1 v + T n v = 0 (1)
m
for some scalars a1 , . . . , an−1 . It follows by induction that T v is in the space
spanned by the vectors v, T v, . . . , T n−1 v for every m ≥ n and hence that the
vectors ei = T i−1 v for 1 ≤ i ≤ n are a basis for V . Note that T ei = ei+1 for
1 ≤ i ≤ n − 1 and that T en = −a0 e1 − a2 e2 − · · · − an−1 en−1 , and therefore
that the matrix A representing the linear mapping T in terms of this basis has
the form
0 0 0 ··· 0 −a0
1 0 0 ··· 0 −a1
0 1 0 ··· 0 −a2
A= .
0 0 1 ··· 0 −a3
···
0 0 0 · · · 1 −an−1
This matrix is called the companion matrix of the polynomial p(λ) = a0 +
a1 λ + · · · + an−1 λn−1 + λn . Conversely if A is the companion matrix to a
polynomial p(λ) as above then Aei = ei+1 for 1 ≤ i ≤ n − 1 and Aen =
−a0 e1 − a2 e2 − · · · − an−1 en−1 , where ei are the standard basis vectors in Rn ;
hence e1 is a cyclic vector for the linear mapping defined by this matrix.
Theorem 4 If A is the companion matrix of the polynomial p(λ) then
χA (λ) = mA (λ) = p(λ).
3
Proof: The identity (1) for the linear transformation A = T and the cyclic
vector v = e1 shows that p(A)e1 = 0; and then 0 = Ai−1 p(A)e1 = p(A)Ai−1 e1 =
p(A)ei for 1 ≤ i ≤ n, so that p(A) = 0 since the vectors ei are a basis. If
f (λ) = b0 + b1 λ + · · · + bm λm is a polynomial of degree m < n such that
f (A) = 0 then similarly
0 = f (A)e1 = b0 e1 + b1 Ae1 + · · · + bm Am e1
= b0 e1 + b1 e2 + · · · + bm em+1 ;
since the vectors e1 , e2 , . . . , em are linearly independent for any m < n it follows
that bi = 0 and hence that f (λ) = 0. Therefore there are no polynomials f (λ)
of degree m < n for which f (A) = 0, so that p(λ) must be the polynomial of
least degree for which p(λ) = 0 and hence p(λ) = mA (λ). Since mA (λ)|χA (λ)
and χA (λ) also is of degree n it further follows that p(λ) = χA (λ).
There do not necessarily exist any cyclic vectors at all for a linear transfor-
mation T . The preceding theorem shows that if there is a cyclic vector for T
then mT (λ) = χT (λ); the converse is also true, but will not be needed here.
Actually it is more useful to write a matrix A as a direct sum of companion
matrices to the simplest possible polynomials; over the complex numbers these
are just the polynomials of the form p(λ) = (λ − a)n . In this case the com-
panion matrix takes a particularly convenient form in terms of another basis.
In place of the basis ei = T i−1 (v) for the cyclic vector v consider the vectors
fi = (T −aI)n−i v. Note that fn = v = e1 , fn−1 = (T −aI)v = e2 −ae1 , fn−2 =
(T − aI)fn−1 = (T − aI)(e2 − ae1 ) = e3 − 2ae2 + a2 e1 , and so on; so it is clear
that the basis vectors ei can be expressed as linear combinations of the vectors
fi and hence that f1 , . . . , fn is also a basis for V . For this basis
so that
T f1 = af1 (2)
and hence f1 is an eigenvector for the eigenvalue a1 ; and for 2 ≤ i ≤ n
so that
T fi = fi−1 + afi for 2 ≤ i ≤ n. (3)
It follows from equations (2) and (3) that the matrix B representing the trans-
4
formation T in terms of the basis f1 , . . . , fn has the form
a 1 0 0 ··· 0 0
0 a 1 0 ··· 0 0
0 0 a 1 ··· 0 0
B(n; a) = ... .
. . .
0 0 0 0 ... a 1
0 0 0 0 ... 0 a
the matrix B has the single eigenvalue a, and the eigenspace E(a) is one dimen-
sional.
Proof: The Jordan block is similar to the companion matrix for the poly-
nomial p(λ) = (λ − a)n , since it arises from a change of basis, so it has the
same characteristic and minimal polynomials as the companion matrix; hence
χB (λ) = mB (λ) = (λ − a)n by the preceding theorem. The matrix aI − B has
entries 1 just above the main diagonal and 0 otherwise, so (aI − B)v = 0 only
for scalar multiples of the vector
1
0
··· .
v=
A Jordan block of size n > 1 is thus not diagonalizable; but the Jordan
blocks provide normal forms for arbitrary matrices.
Theorem 6 For any linear transformation T : V −→ V of a finite-dimensional
complex vector space V to itself, there is a basis of V in terms of which T is
represented by a matrix
5
First suppose that T is singular, so has a nontrivial kernel. Since W =
T (V ) ⊂ V is a T -invariant subspace with dim W = m < n it follows from the
inductive hypothesis that there is a basis of W for which T |W is represented by
a matrix
A0 = B(n1 ; a1 ) ⊕ B(n2 ; a2 ) ⊕ · · · ⊕ B(nk ; ak )
where B(ni ; ai ) are Jordan blocks; this corresponds to a direct sum decomposi-
tion
W = X1 ⊕ X2 ⊕ · · · ⊕ Xk
where Xi ⊂ W ⊂ V are T -invariant subspaces and T |Xi : Xi −→ Xi is rep-
resented by the matrix B(ni ; ai ). Each subspace Xi contains a single eigen-
vector vi with eigenvalue ai , by the preceding theorem. Let K ⊂ V be the
kernel of the linear transformation T , so dim K = n − m, and suppose that
dim(K ∩ W ) = r ≤ n − m. There are r linearly independent vectors vi ∈ K ∩ W
such that T (vi ) = 0, which are eigenvectors of T |W with eigenvalues 0; so
it can be supposed further that these are just the eigenvectors vi ∈ Xi for
1 ≤ i ≤ r. Extend these vectors to a basis v1 , . . . , vr , ur+1 , . . . , un−m for the
subspace K. The vectors ur+1 , . . . , un−m span a T -invariant subspace U ⊂ V
with dim U = n − m − r. On each of the subspaces Xi for 1 ≤ i ≤ r the
transformation T is represented by a Jordan block B(ni ; 0) in terms of a basis
ei1 , ei2 , . . . , eini ; so T ei1 = 0 and ei1 = vi is one of the basis vectors of the ker-
nel K, while T eij = ei j−1 for 2 ≤ i ≤ ni . That clearly shows that the vectors
ei2 , . . . , ei ni −1 are in the image W = T (V ), which of course they must be since
Xi ⊂ W ; but ei ni ∈ W = T (V ) as well, so there must be a vector fi ∈ V
for which ei ni = T (fi ). The vector fi is not contained in Xi , since no linear
combination of the vectors in Xi can have as its image under T the vector ei ni ;
hence Xi0 = Xi ⊕ L(fi ) is a linear subspace of V with dim Xi0 = dim Xi + 1,
and x0i is also a T -invariant subspace of V . Note that in terms of the basis
ei1 , ei2 , . . . , eini , fi for Xi0 the restriction T |Xi0 : Xi0 −→ Xi0 is represented by
a Jordan block B(ni + 1; 0). The proof of this part of the theorem will be
concluded by showing that
V = Xi0 ⊕ · · · ⊕ Xr0 ⊕ Xr+1 · · · ⊕ Xk ⊕ L(ur+1 ) ⊕ · · · ⊕ L(un−m ).
The dimension count is correct, since W = X1 ⊕ · · · ⊕ Xk has dimension m, each
Xi0 increases the dimension by 1 for an additional r, and there are n − m − r of
the vectors ui . It is hence enough just to show that there can be no nontrivial
linear relation of the form
k
X r
X n−m
X
0= ci xi + c0 fi + c00i ui
i=1 i=1 i=1
6
since T ui = 0, and since V = X1 ⊕ · · · ⊕ Xk this identity can hold only when
ci T xi + c0i ei ni = 0 for each i; but ei ni cannot be the image of any vector in
Pn−m
Xi , so that ci = c0i = 0. Finally i=r+1 c00 ui = 0, and since the vectors ui are
linearly independent it follows that c00i = 0 as well.
For the case in which T is nonsingular, the linear transformation T has an
eigenvalue λ over the complex numbers so the transformation T − λI will be
singular. The preceding part of the proof shows that T − λI can be represented
by a matrix A that is a sum of Jordan blocks; and then T clearly is represented
by the matrix A + λI, which is easily seen also to be a sum of Jordan blocks.
That concludes the proof.
An equivalent statement of the preceding theorem is that any matrix is
similar over the complex numbers to a matrix that is a direct sum of Jordan
blocks; this is often called the Jordan normal or Jordan canonical form for the
matrix. These blocks actually are determined uniquely and characterize the
matrix up to similarity.
Theorem 7 Over the complex numbers, any matrix is similar to a unique direct
sum of Jordan blocks.
Proof: By the preceding theorem, over the complex numbers any matrix is
similar to one of the form
A = B(n1 ; a1 ) ⊕ B(n2 ; a2 ) ⊕ · · · ⊕ B(nk ; ak ) (4)
where B(ni ; ai ) are Jordan blocks. The distinct eigenvalues of A are the distinct
roots of the characteristic polynomial χA (λ), so are uniquely determined. Each
Jordan block corresponds to a single eigenvector, so the number of distinct
Jordan blocks for which ai = a for any particular eigenvalue a is also uniquely
determined. It is thus only necessary to show that the sizes ni of the Jordan
blocks associated to an eigenvalue a are also uniquely determined.
Suppose therefore that a1 = · · · = ar = a but that ai 6= a for i > r, so a is
an eigenvector for which dim E(a) = r. For a Jordan block B(n; a) the matrix
B(n; a) − aI is an n × n matrix with entries 1 above the main diagonal and
zero elsewhere, so its kernel is L(e1 ); the matrix (B(n; a) − aI)2 has entries 1
on the second line above the main diagonal and zero elsewhere, so its kernel is
L(e1 )⊕L(e2 ); and so on. Thus the dimension of the kernel of the linear mapping
(B(n; a) − aI)j is j if j ≤ n but is n if j ≥ n; and hence
1 if j ≤ n
dim ker(B(n; a) − aI)j − dim ker(B(n; a) − aI)j−1 =
0 if j > n.
7
That determines the number of blocks B(n; a) of any size n uniquely just in
terms of the matrix A, and concludes the proof.
EXAMPLE
−4 9 −4 0 1 0 1 0 −1
for λ = −1 : −4 9 −4 0 ∼ 0 1 0 0 , v1 = 0 .
1 0 1 0 0 0 0 0 1
−7 9 −4 0 1 0 −2 0 2
for λ = 2 : −4 6 −4 0 ∼ 0 1 −2 0 , v2 = 2 .
1 0 −2 0 0 0 0 0 1
There is only one eigenvector
v2 for the eigenvalue λ = 2, so there must be a
2 1
Jordan block of the form ; thus in addition to the vector v2 there
0 2
must be another vector v3 for which Av3 = 2v3 + v2 or (A − 2I)v3 = v2 . That
vector also can be found by solving this last system of linear equations by row
reduction, so
−7 9 −4 2 1 0 −2 1 1 2
−4 6 −4 2 ∼ 0 1 −2 1 , v3 = 1 + x3 2
1 0 −2 1 0 0 0 0 0 1
for any value x3 ; in particular it is possible just to take x3 = 0. With this choice
the normal form for the matrix A is
1 0 0
B= 0 2 1
0 0 2
8
it follows that AC = CB or C −1 AC = B.
The characteristic and minimal polynomials for any matrix can be read
directly from the Jordan block normal form. For instance, for the matrix
it follows that
Conversely if A is a 3×3 matrix for which χA (λ) = (λ−7)2 (λ−1) and mA (λ) =
(λ − 7)(λ − 1) then A is similar to the matrix B = B(1; 1) ⊕ B(1; 7) ⊕ B(1; 7),
explicitly
1 0 0
B = 0 7 0 .
0 0 7
The normal form of a matrix is not always determined uniquely just by its
characteristic and minimal polynomials, although there are only finitely many
different possibilities up to order.
The exponentials of Jordan blocks are easy to calculate. A Jordan block has
the form B(n; a) = aI + N where the matrix N has entries 1 on the line above
the main diagonal and 0 elsewhere. The matrix N 2 has entries 1 on the second
line above the main diagonal and 0 elsewhere, N 3 has entries 1 on the third line
above the main diagonal and 0 elsewhere, and so on. Since the matrices I and
N commute
1 2 1 3 1
1 t 2! t 3! t · · · (n−1)! tn−1
1 2 1
0 1
t 2! t · · · (n−2)! tn−2
1
B(n;a)t aIt+N t at N t at 0 0
1 t · · · (n−3)! tn−3
e =e =e e =e .
0 0 0 0 ··· t
0 0 0 0 ··· 1
9
PART 2 – INNER PRODUCT SPACES
In this part of these notes, suppose that V is a finite-dimensional vector
space V over either the real or complex numbers, with an inner product (v, w).
Theorem 8 For any linear transformation T : V −→ V of a finite dimensional
vector space V with an inner product there is a unique linear transformation
T ∗ : V −→ V such that (T ∗ v, w) = (v, T w) for any vectors v, w ∈ V .
Proof: If e1 , . . . , en ∈ V is an orthonormal basis and if there is such a linear
transformation T ∗ then
(T ∗ ei , ej ) = (ei , T ej )
hence necessarily
n
X
∗
T ei = (ei , T ej )ej .
j=1
On the other hand this last formula does define a linear transformation T ∗ with
the asserted properties on a basis for V , hence on all of V .
The linear transformation T ∗ is called the adjoint transformation to T .
Theorem 9 The adjoint transformation has the following properties:
(i) (S + T )∗ = S ∗ + T ∗
(ii) (aT )∗ = aT ∗
(iii) (ST )∗ = T ∗ S ∗
(iv) I ∗ = I for the identity transformation I
(v) T ∗∗ = T
(vi) (T v, w) = (v, T ∗ w) for all v, w ∈ V
(vii) If T is represented by a matrix A in terms of an orthonormal basis for V
then T ∗ is represented by the matrix At in terms of the same basis.
Proof: The proof follows quite easily from the existence and uniqueness of the
adjoint, the properties of the inner product, and the following observations, for
any vectors v, w ∈ V .
(i) (v, (S + T )w) = (v, Sw) + (v, T w) = (S ∗ v, w) + (T ∗ v, w) = ((S ∗ + T ∗ )v, w)
(ii) (v, aT w) = a(v, T w) = a(T ∗ v, w) = (aT ∗ v, w)
(iii) (v, ST w) = (S ∗ v, T w) = (T ∗ S ∗ v, w)
(iv) (v, w) = (v, Iw) = (I ∗ v, w)
(v) (v, T w) = (T ∗ v, w) = (w, T ∗ v) = (T ∗∗ w, v) = (v, T ∗∗ w)
(vi) (T v, w) = (w, T v) = (T ∗ w, v) = (v, T ∗ w)
(vii) If ei ∈ V is an orthonormal basis then the matrix P A representing T in
terms of this basis has entries aji defined by T ei = j aji ej , and P similarly the
matrix A∗ representing T ∗ has entries a∗ji defined by T ∗ ei = j a∗ji ej . Since
the basis is orthonormal
a∗ji = (T ∗ ei , ej ) = (ei , T ej ) = (T ej , ei ) = aij
.
10
Theorem 10 The following conditions on a linear transformation T : V −→ V
on a finite-dimensional real or complex vector space V with an inner product
are equivalent:
(i) T T ∗ = T ∗ T = I;
(ii) (T v, T w) = (v, w) for all v, w ∈ V ;
(iii) ||T v|| = ||v|| for all v, w ∈ V .
Proof:
(i)=⇒ (ii):
If (i) holds then (T v, T w) = (v, T ∗ T w) = (v, w) for all v, w ∈ V .
(ii)=⇒(i):
If (ii) holds then (v, w) = (T v, T w) = (v, T ∗ T w) for all v, w ∈ V , hence T ∗ T = I;
and for a finite-dimensional vector space it follows from this that T T ∗ = I as
well.
(ii) =⇒(iii):
If (ii) holds then for the special case w = v it follows that ||T v||2 = (T v, T v) =
(v, v) = ||v||2 , hence ||T v|| = ||v|| since both are positive.
(iii) =⇒ (ii):
Recall that ||v + w||2 = (v + w, v + w) = (v, v) + (v, w) + (w, v) + (w, w) =
||v||2 + 2<(v, w) + ||w||2 . where <z denotes the real part of a complex number z.
If (iii) holds then 2<(T v, T w) = ||T (v + w)||2 − ||T v||2 − ||T w||2 = ||v + w||2 −
||v||2 − ||w||2 || = <(v, w), and that is condition (ii) in the real case. For the
complex case note that −=(T v, T w) = <i(T v, T w) = <(T iv, T w) = <(iv, w) =
<i(v, w) = −=(v, w) where =z denotes the imaginary part of a complex number
z. The equalities for the real and imaginary parts separately show that (ii) holds.
TERMINOLOGY
11
orthogonal or unitary transformation is normal; but there are also other normal
linear transformations. The matrices of these special types of normal linear
transformations have eigenvalues of special forms as well.
Proof:
(i) If T v = 0 then 0 = (T v, T v) = (v, T ∗ T v) = (v, T T ∗ v) = (T ∗ v, T ∗ v) so that
T ∗ v = 0.
(ii) Note that T − aI is normal whenever T is normal, and that (T − aI)∗ =
T ∗ − aI; then from (i) it follows that (T − aI)v = 0 if and only if (T ∗ − aI)v = 0.
(iii) Since a1 (v1 , v2 ) = (a1 v1 , v2 ) = (T v1 , v2 ) = (v1 , T ∗ v2 ) = (v2 , a2 v2 ) =
a2 (v1 , v2 ) it follows that (a1 − a2 )(v1 , v2 ) = 0 so that (v1 , v2 ) = 0 if a1 6= a2 .
It follows from the preceding theorem that a normal linear transformation T
and its adjoint T ∗ have the same eigenvectors, although with complex conjugate
eigenvalues; and that the eigenspaces for distinct eigenvalues are orthogonal.
12
of T . The subspace W ⊥ is then T ∗ -invariant; indeed if v ∈ W ⊥ then for all
w ∈ W it follows that (T ∗ v, w) = (v, T w) = 0, since T w ∈ W , hence T ∗ v ∈ W ⊥
as well. The subspace W is also T ∗ -invariant, since it is the eigenspace of T ∗
for the eigenvalue a by Theorem 12(ii); and since T ∗∗ = T it follows by the
argument of the preceding sentence that the subspace W ⊥ is also T -invariant.
The restrictions of T and T ∗ to the subspace W ⊥ also satisfy (T ∗ v, w) = (v, T w)
for any v, w ∈ W ⊥ , so that the restriction of T ∗ to W ⊥ is the adjoint to the
restriction of T to that subspace; and therefore the restriction of T to that
subspace is also normal. It follows from the induction hypothesis that there is
an orthonormal basis for W ⊥ that consists of eigenvectors of T ; and when that
basis is extended to an orthonormal basis of V by adjoining enough additional
vectors in W there results an orthonormal basis of V , which concludes the proof.
If there is an orthonormal basis for V consisting of eigenvectors for T then
the basis also consists of eigenvectors for T ∗ by Theorem 12(ii), so both T and
T ∗ are represented by diagonal matrices; and it follows that T T ∗ = T ∗ T , so that
T must be normal. Thus the normality of T is both necessary and sufficient for
there to exist an orthonormal basis of eigenvectors for T . The spectral theorem
can be restated in terms of matrices as follows.
where δji are the entries of the identity matrix (also called Kronecker’s symbol),
or in matrix terms I = U t U ; since the identity matrix is real this is the same
as I = U U ∗ , so the matrix U is unitary.
The hypothesis that V is a complex vector space is essential, since linear
transformations of real vector spaces need not have any eigenvectors at all.
However for special real linear transformations the spectral theorem does hold.
13
Theorem 15 (Real Spectral Theorem) If T : V −→ V is a symmetric
transformation of a real vector space V then V has an orthonormal basis of
eigenvectors for T .
Proof: The same argument as in the preceding corollary yields the desired
result.
14