Lecture Notes PDF
Lecture Notes PDF
Alan Lauder∗
Contents
2.1 Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Quotient Spaces 12
∗ These notes are a revision of ones kindly provided by Ulrike Tillmann.
1
4 Triangular Form and the Cayley-Hamilton Theorem 17
7 Dual Spaces 28
7.1 Annihilators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2
1 Vector Spaces and Linear maps
Before we can define vector spaces we need the notion of a field (discussed in Prelims Analysis I).
Definition 1.1. A set F with two binary operations + and × is a field if both (F, +, 0) and
(F \ {0}, ×, 1) are abelian groups and the distribution law holds:
1 + 1 + · · · + 1 (p times) = 0
Example 1.2 The following are examples of fields (Fp and number fields like Q[i] are discussed
in the exercises).
Characteristic 0 : Q, Q[i], R, C.
Characteristic p : Fp = {0, 1, · · · , p − 1} with arithmetic modulo p.
Definition 1.3. A vector space V over a field F is an abelian group (V, +, 0) together with a
scalar multiplication F × V → V such that for all a, b ∈ F, v, w ∈ V :
(1) a(v + w) = av + aw
(2) (a + b)v = av + bv
(3) (ab)v = a(bv)
(4) 1.v = v
3
(2) A set S ⊆ V is spanning if for all v ∈ V there exists a1 , · · · , an ∈ F and s1 , · · · , sn ∈ S with
v = a1 s 1 + · · · + an s n .
(3) A set B ⊆ V is a basis of V if B is spanning and linearly independent. The size of B is the
dimension of V .
You saw in Prelims Linear Algebra I that every vector space with a finite spanning set has a basis
and that the dimension of such vector spaces is well-defined.
Example 1.5
(3) Let
V = RN = {(a1 , a2 , a3 , · · · )| ai ∈ R},
Definition 1.6. Suppose V and W are vector spaces over F. A map T : V → W is a linear
transformation (or just linear map) if for all a ∈ F, v, v ′ ∈ V ,
T (av + v ′ ) = aT (v) + T (v ′ ).
Example 1.7
(1) The linear map T : R[x] → R[x] given by f (x) 7→ xf (x) is an injection; it defines an isomor-
phism from R[x] to its image xR[x]
(2) The linear map T : W ⊆ RN → R[x] given by en = (0, · · · , 1, 0, · · · ) 7→ xn−1 defines an
isomorphism.
4
(3) Let Hom(V, W ) be the set of linear maps from V to W . For a ∈ F, v ∈ V , and S, T ∈
Hom(V, W ) define:
(aT )(v) := a(T (v))
(T + S)(v) := T (v) + S(v)
With these definitions Hom(V, W ) is a vector space over F.
Every linear map T : V → W is determined by its values on a basis B for V (as B is spanning).
Vice versa, given any map T : B → W it can be extended to a linear map T : V → W (as B is
linearly independent).
Let B = {e1 , · · · , en } and B′ = {e′1 , · · · , e′m } be bases for V and W respectively. Let B′ [T ]B be
the matrix with (i, j)-entry aij such that:
T (ej ) = a1j e′1 + · · · + amj e′m .
(We call B the initial basis and B′ the final basis.1 ) Note that B′ [aT ]B = a(B′ [T ]B ) and B′ [T +
S]B = B′ [T ]B + B′ [S]B .
Furthermore, if S ∈ Hom(W, U ) for some finite dimensional vector space U with basis B′′ , then:
B′′ [S ◦ T ]B = B′′ [S]B′ B′ [T ]B
In particular, if T : V → V and B and B′ are two different bases with B [Id]B′ the change of basis
matrix then:
B′ [T ]B′ = B′ [Id]B B [T ]B B [Id]B′ with B [Id]B′ B′ [Id]B = B [Id]B = I, the identity matrix.
The study of vector spaces and linear maps between them naturally leads us to the study of rings;
in particular, the ring of polynomials F[x] and the ring of (n × n)-matrices Mn (F).
1 In Prelims Linear Algebra II, I used the notation M B (T ), but I think ′ [T ] is better as it helps you remember
B′ B B
which is the initial basis and which the final one.
5
2.1 Rings
Definition 2.1. A non-empty set R with two binary operations + and × is a ring if (R, +, 0) is an
abelian group, the multiplication × is associative and the distribution laws hold: for all a, b, c ∈ R,
(a + b)c = ac + bc and a(b + c) = ab + ac.
The ring R is called commutative if for all a, b ∈ R we have ab = ba.
Example 2.2
Definition 2.3. A map φ : R → S between two rings is a ring homomorphism if for all
r, r′ ∈ R:
φ(r + r′ ) = φ(r) + φ(r′ ) and φ(rr′ ) = φ(r)φ(r′ ).
A bijective ring homomorphism is called a ring isomorphism.
Example 2.4 When W = V and B′ = B, we can reinterpret Theorem 1.8 to say that T 7→ B [T ]B
defines an isomorphism of rings from Hom(V, V ) to Mn (F) where n is the dimension of V .
Warning: Some books insist on rings having a multiplicative identity 1 and on ring homomor-
phisms taking 1 to 1. If we do not insist on rings having 1’s, then any ideal is a subring. (Note
that in a ring with an identity 1, any ideal that contains 1 is the whole ring.)
Example 2.6
(1) mZ is an ideal in Z. Indeed, every ideal in Z is of this form. [To prove this, let m be the
smallest non-zero integer in the ideal I and prove that I = mZ.]
6
(2) The set of diagonal matrices in Mn (R) is closed under addition and multiplication (i.e. it is a
subring) but for n > 1 is not an ideal.
Ideals are to rings what normal subgroups are to groups in the sense that the set of additive cosets
R/I inherit a ring structure from R if I is an ideal. For r, r′ ∈ R define
Theorem 2.7 (First Isomorphism Theorem). The kernel Ker(φ) := φ−1 (0) of a ring homomor-
phism φ : R → S is an ideal, its image Im(φ) is a subring of S, and φ induces an isomorphisms of
rings
R/Ker(φ) ∼= Im(φ).
Proof. Exercise. [Show that the underlying isomorphisms of abelian groups is compatible with the
multiplication, i.e. is a ring homomorphism.]
We will discuss polynomials over a field F in more detail. The first goal is to show that there is a
division algorithm for polynomial rings. With the help of this we will able to show the important
property that every ideal in F[x] is generated by one element.
Theorem 2.8. [“Division algorithm” for polynomials] Let f (x), g(x) ∈ F[x] be two polynomials
with g(x) 6= 0. Then there exists q(x), r(x) ∈ F[x] such that
Proof. If deg f (x) < deg g(x), put q(x) = 0, r(x) = f (x). Assume now that deg f (x) ≥ deg g(x)
and let
f (x) = an xn + an−1 xn−1 + · · · + a0
g(x) = bk xk + bk−1 xk−1 + · · · + b0 , (bk 6= 0)
Then
an n−k
deg f (x) − x g(x) <n
bk
By induction on deg f − deg g, there exist s(x), t(x) such that
an n−k
f (x) − x g(x) = s(x)g(x) + t(x) and deg g(x) > deg t(x).
bk
an n−k
Hence put q(x) = bk x + s(x) and r(x) = t(x).
7
Corollary 2.9. For all f (x) ∈ F[x] and a ∈ F,
where r(x) is constant (as deg r(x) < 1). Evaluating at a gives
f (a) = 0 = q(a)(a − a) + r = r
and hence r = 0.
Corollary 2.10. Assume f 6= 0. If deg f ≤ n then f has at most n roots.
Let a(x), b(x) ∈ F[x] be two polynomials. Let c(x) be a monic polynomial of highest degree dividing
both a(x) and b(x) and write c = gcd(a, b) (also wrote less commonly hcf(a, b)).
Proposition 2.11. Let a, b ∈ F[x] be non-zero polynomials and let gcd(a, b) = c. Then there exist
s, t ∈ F[x] such that:
a(x)s(x) + b(x)t(x) = c(x).
Proof. If c 6= 1, divide a and b by c. We may thus assume deg(a) ≥ deg(b) and gcd(a, b) = 1, and
will proceed by induction on deg(a) + deg(b).
Assume r 6= 0. Then by the induction hypothesis, there exist s′ , t′ ∈ F[x] such that
bs′ + rt′ = 1.
Hence,
bs′ + (a − qb)t′ = 1 and at′ + b(s′ − qt′ ) = 1
So, we may put t = t′ and s = s′ − qt′ .
8
Exercise: Prove that every ideal I ⊆ F[x] is generated by one element. In other words, given an
ideal I there exists a polynomial c(x) such that
We will now turn to evaluating polynomials at (n × n)-matrices. Given a matrix we can associate
two special polynomials to it, its characteristic and its minimal polynomial. As we will see these
will encode much of the information of interest.
f (A) := ak Ak + · · · + a0 I ∈ Mn (F).
Since Ap Aq = Aq Ap and λA = Aλ for p, q ≥ 0 and λ ∈ F, then for all f (x), g(x) ∈ F[x] we have
that
f (A)g(A) = g(A)f (A);
Av = λv ⇒ f (A)v = f (λ)v.
Lemma 2.12. For all A ∈ Mn (F), there exists a non-zero polynomial f (x) ∈ F[x] such that
f (A) = 0.
Proof. Note that the dimension dim Mn (F) = n × n is finite. Hence {I, A, A2 , · · · , Ak } as a subset
of Mn (F) is linearly dependent for k ≥ n2 . So there exist scalars ai ∈ F, not all zero, such that
ak Ak + · · · + a0 I = 0,
We can express much of the above in terms of ring theory as follows. For any (n × n)-matrix A,
the assignment f (x) 7→ f (A) defines a ring homomorphism
EA : F[x] → Mn (F).
Lemma 2.12 tells us the kernel is non-zero, and moreover as F[x] is commutative so is the image
of EA ; that is, f (A)g(A) = g(A)f (A) for all polynomials f and g.
Our next step is to determine the unique monic polynomial generating the kernel of EA .
9
2.4 Minimal and characteristic polynomials
Definition 2.13. The minimal polynomial of A, denoted by mA (x), is the monic polynomial
p(x) of least degree such that p(A) = 0.
Theorem 2.14. If f (A) = 0 then mA |f . Furthermore, mA is unique (hence showing that mA is
well-defined).
Proof. By the division algorithm, Theorem 2.8, there exist polynomials q, r with deg r < deg mA
such that
f = qmA + r.
Evaluating both sides at A gives r(A) = 0. By the minimality property of mA ,
r=0
and mA divides f . To show uniqueness, let m be another monic polynomial of minimal degree
and m(A) = 0. Then by the above mA |m. Also m and mA must have the same degree, and so
m = amA for some a ∈ F. Since both polynomials are monic it follows that a = 1 and m = mA .
Theorem 2.18.
λ is an eigenvalue of A
⇔ λ is a root of χA (x)
⇔ λ is a root of mA (x)
Proof.
10
Conversely, assume λ is a root of mA . Then mA (x) = g(x)(x − λ) for some polynomial g. By
minimality of mA , we have g(A) 6= 0. Hence there exists w ∈ Fn such that g(A)w 6= 0. Put
v = g(A)w then
(A − λI)v = mA (A)w = 0,
and v is a λ-eigenvector for A.
One of our next goals is to prove that χA annihilates A and hence that mA |χA .
We finish this section by recording how to translate back what we have learnt about matrices to
the world of linear maps. In particular we will show that it makes sense to speak of a minimal and
characteristic polynomial of a linear transformation.
We will not be able to show this in this course. However, you should be able to prove it using
complex analysis by the end of this term. (Consider g(z) = 1/f (z). If f (z) has no roots in C, it is
holomorphic and bounded on C, which leads to a contradiction.)
11
Definition 2.23. An algebraically closed field F̄ containing F with the property that there does not
exist a smaller algebraically closed field L with
F̄ ⊇ L ⊇ F
The proof is beyond this course but it will be convenient to appeal to this result.
Challenge: Prove that no finite field is algebraically closed. [Hint: imitate the standard proof
that there are infinitely many primes.]
3 Quotient Spaces
In the theory of groups and rings the notion of a quotient is an important and natural concept.
Recall that the image of a group or ring homomorphism is best understood as a quotient of the
source by the kernel of the homomorphism. Similarly, for vector spaces it is natural to consider
quotient spaces.
(v + U ) + (w + U ) := v + w + U
a(v + U ) := av + U
Proof. We need to check that the operations are well-defined. Assume v + U = v ′ + U and
w + U = w′ + U . Then v = v ′ + u, w = w′ + ũ for u, ũ ∈ U . Hence:
(v + U ) + (w + U ) = v + w + U
= v ′ + u + w′ + ũ + U as u + ũ ∈ U
′ ′
=v +w +U
= (v ′ + U ) + (w′ + U ).
12
Similarly,
a(v + U ) = av + U
= av ′ + au + U
= av ′ + U as au ∈ U
′
= a(v + U )
That these operations satisfy the vector space axioms follows immediately from the fact that the
operations in V satisfy them.
Often in the literature the quotient construction is avoided in the context of vector spaces. This
is because any quotient V /U of a vector space V by a subspace U can be “realised” itself as a
subspace of V itself 2 . That is, by extending a basis for U to one for V , we can choose a subspace
W such that V = U ⊕ W . Then each v ∈ V can be written uniquely as u + w for some u ∈ U and
w ∈ W , and this allows us to define an isomorphism V /U → W by v + U 7→ w. However, such an
isomorphism involves a choice of W and it is often easier to avoid having to make this choice (and
thus avoid showing that further constructions and results are independent of it).
Let E be a basis of U , and extend E to a basis B of V (we assume this is possible, which we
certainly know to be the case at least for V finite dimensional).
Define
B := {e + U | e ∈ B\E} ⊆ V /U.
a1 (e1 + U ) + · · · + ar (er + U ) = U.
a1 e1 + · · · + ar er = b1 e′1 + · · · + bs e′s
2 This is in contrast to the world of groups and rings. For example Z/2Z is a quotient group (and ring) of Z but
13
for some e′1 , · · · , e′s ∈ E and b1 , · · · , bs ∈ F as E spans U . But then a1 = · · · = ar = −b1 = · · · =
−bs = 0 as B is linearly independent, and thus B̄ is linearly independent.
Proposition 3.3. Let U ⊂ V be vector spaces, with E a basis for U , and F ⊂ V a set of vectors
such that
{v + U : v ∈ F}
is a basis for the quotient V /U . Then the union
E∪F
is a basis for V .
Proof. Exercise.
Example 3.4
V = F[x] B = {1, x, x2 , · · · }
U = even polynomials E = {1, x2 , x4 , · · · }
V /U ≃ odd polynomials B = {x + U, x3 + U, · · · }
Theorem 3.6 (First Isomorphism Theorem). Let T : V → W be a linear map of vector spaces
over F. Then
T : V /Ker(T ) → Im(T )
v + Ker(T ) 7→ T (v)
is an isomorphism of vector spaces.
Proof. It follows from the first isomorphism theorem for groups that T is an isomorphism of
(abelian) groups. T is also compatible with scalar multiplication. Thus T is a linear isomorphism.
14
Detailed working:
Surjective: w ∈ Im(T )
⇒ ∃v ∈ V : T (v) = w
⇒ T (v + Ker(T )) = T (v) = w
⇒ w ∈ Im(T )
Lemma 3.8. The formula T (v + A) := T (v) + B gives a well-defined linear map of quotients
T : V /A → W/B if and only if T (A) ⊆ B.
15
Proof. Assume T (A) ⊆ B. Now T will be linear if it is well-defined. Assume v + A = v ′ + A. Then
v = v ′ + a for some a ∈ A. So
T (v + A) = T (v) + B by definition
′
= T (v + a) + B
= T (v ′ ) + T (a) + B as T is linear
′
= T (v ) + B as T (A) ⊆ B
′
= T (v + A).
Assume now that V and W are finite dimensional. Let B = {e1 , · · · , en } be a basis for V with
E = {e1 , · · · , ek } a basis for a subspace A ⊆ V (so k ≤ n). Let B′ = {e′1 , · · · , e′m } be a basis for W
with E′ = {e′1 , · · · , e′ℓ } a basis for a subspace B ⊆ W . The induced bases for V /A and W/B are
given by
B = ek+1 + A, · · · , en + A and
B′ = e′ℓ+1 + B, · · · , e′m + B.
Let T : V → W be a linear map such that T (A) ⊆ B. Then T induces a map T on quotients by
Lemma 3.7 and restricts to a linear map
Proof. For j ≤ k, T (ej ) ∈ B and hence aij = 0 for i > ℓ and aij is equal to the (i, j)-entry of
E′ [T |A ]E for i ≤ ℓ. To identify the bottom right corner of the matrix, note that
T (ej + A) =T (ej ) + B
=a1j e′1 + · · · + amj e′m + B
=aℓ+1,j (e′ℓ+1 + B) + · · · + amj (e′m + B).
16
4 Triangular Form and the Cayley-Hamilton Theorem
The goal of this chapter is to prove that the characteristic polynomial of an (n × n)-matrix is
annihilating; that is, the polynomial vanishes when evaluated at the matrix. This will also give us
control on the minimal polynomial.
Hence, U is invariant under any polynomial p(x) evaluated at T . Furthermore, p(T ) induces a
map of quotients
p(T ) : V /U → V /U.
Example 4.2 Let Vλ := ker(T − λI) be the λ-eigenspace of T . Then Vλ is T -invariant. Let W :=
ker(g(T )) be the kernel of g(T ) for some g(x) ∈ F[x]. Then W is T -invariant as g(T )T = T g(T ).
Proof. Extend a basis E for U to a basis B of V . Let B be the associated basis for V /U . By
Theorem 3.9 !
E [T |U ]E ∗
B [T ]B = .
0 B [T ]B
The determinant of such an upper triangular block matrix is the product of the determinants of
the diagonal blocks.
17
Note that this formula does not hold for the minimal polynomial (the identity map yielding a
counterexample in dimension ≥ 2).
Definition 4.4. A = (aij ) an n × n matrix is upper triangular if aij = 0 for all i > j.
Theorem 4.5. Let V be a finite-dimensional vector space, and let T : V → V be a linear map
such that its characteristic polynomial is a product of linear factors. Then, there exists a basis B
of V such that B [T ]B is upper triangular.
Note: If F is an algebraically closed field, such as C, then the characteristic polynomial always
satisfies the hypothesis.
Proof. By induction on the dimension of V . Note when V is one dimensional, there is nothing
more to prove. In general, by assumption χT has a root λ and hence there exists a v1 6= 0 such
that T v1 = λv1 . Put U = hv1 i, the line spanned by v1 . As v1 is an eigenvector, U is T -invariant.
Thus we may consider the induced map on quotients
T := V /U → V /U.
By Proposition 4.3,
χT (x) = χT (x)/(λ − x)
and hence is also a product of linear factors and furthermore dim V /U = dim(V )−1. Hence, by the
induction hypothesis, there exists B = {v2 + U, . . . , vn + U } such that B [T ]B is upper triangular.
Put B = {v1 , v2 , . . . , vn }. Then B is a basis for V , by Proposition 3.3, and
!
λ ∗
B [T ]B =
0 B [T ]B
is upper triangular.
Hence, since
Im(A − λn I) ⊆ he1 , . . . , en−1 i
18
Im(A − λn−1 I)(A − λn I) ⊆ he1 , . . . , en−2 i
and so on, we have that
n
Y
(A − λi I) = 0
i=1
as required.
Proof. Let A be the matrix of T with respect to some basis for V . We will work over the algebraic
closure F ⊇ F. 3 In F[x], every polynomial factors into linear terms. Thus, by Corollary 4.6, there
exists a matrix P ∈ Mn (F) such that P −1 AP is upper triangular with diagonal entries λ1 , . . . , λn .
Thus,
n
Y
χP −1 AP (x) = (−1)dim(V ) (x − λk )
k=1
−1
and by Proposition 4.7, we have χP −1 AP (P AP ) = 0. As
χT (x) := χP −1 AP (x)
we have that also χT (T ) = 0. The minimal polynomial divides annihilating polynomials by Theo-
rem 2.14, and so mT (x) | χT (x).
What is wrong with the following “proof” of the Cayley-Hamilton theorem? “χA (x) := det(A−xI)
and hence χA (A) = det(A − A · I) = det(0) = 0”. (This is not a proof; come to the lectures to find
out why.)
Example 4.9
1 2
A= ,
−1 0
1−x 2
χA (x) = det = x2 − x + 2,
−1 −x
2 −1 2 1 2 2 0
χA (A) = A − A + 2I = − + = 0.
−1 −2 −1 0 0 2
3 It would be enough to work over the finite extension of fields L = F[λ1 , . . . , λn ].
19
As A − λI 6= 0 for any choice of λ, the minimal polynomial cannot be of degree one. As mA | χA
we must have mA = χA . (Alternatively, since χA (x) has non-zero discriminant, it has two distinct
roots — since the minimal and characteristic polynomials have the same roots, not counting
multipliicity, and mA |χA we must here have mA = χA .)
Example 4.10
1 1 0 0
0 1 0 0
A=
0
, χA (x) = (1 − x)2 (2 − x)2 .
0 2 0
0 0 0 2
Our goal is to use the Cayley-Hamilton Theorem and Proposition 2.11 to decompose V into T -
invariant subspaces. We start with some remarks on direct sum decompositions.
V = W1 ⊕ · · · ⊕ Wr
v = w1 + · · · + wr with wi ∈ Wi .
Assume from now on that V is finite dimensional. If T : V → V is a linear map such that each Wi
20
is T -invariant, then the matrix of T with respect to the basis B is block diagonal
A1
..
B [T ]B = . with Ai =Bi [T |Wi ]Bi
Ar
and χT (x) = χT|W (x) · · · χT|W (x).
1 r
Proof. By Proposition 2.11, there exist s, t such that as + bt = 1. But then a(T )s(T ) + b(T )t(T ) =
IdV and for all v ∈ V
(∗) a(T )s(T )v + b(T )t(T )v = v.
As f is annihilating
a(T )(b(T )t(T )v) = f (T )t(T )v = 0 and b(T )(a(T )s(T )v) = 0.
This shows that V = Ker(a(T )) + Ker(b(T )).
To show that this is a direct sum decomposition, assume that v ∈ Ker(a(T )) ∩ Ker(b(T )). But then
by equation (∗) we have v = 0 + 0 = 0. Thus
V = Ker(a(T )) ⊕ Ker(b(T )).
To see that both factors are T -invariant note that for v ∈ Ker(a(T ))
a(T )(T (v)) = T (a(T )v) = T (0) = 0
and similarly b(T )T (v) = 0 for v ∈ Ker(b(T )).
21
Theorem 5.2. [Primary Decomposition Theorem] Let mT be the minimal polynomial and write
it in the form
mT (x) = f1q1 (x) · · · frqr (x)
where the fi are distinct monic irreducible polynomials. Put Wi := Ker(fiqi (T )). Then
(i) V = W1 ⊕ · · · ⊕ Wr ;
(ii) Wi is T -invariant;
(iii) mT |Wi = fiqi .
q
Proof. Put a = f1q1 · · · fr−1
r−1
, and b = frqr and proceed by induction on r applying Proposition
5.1.
Proposition 5.3. There exists unique distinct irreducible monic polynomials f1 , · · · , fr ∈ F[x]
and positive integers ni ≥ qi > 0 (1 ≤ i ≤ r) such that
Proof. Factor mT = f1q1 · · · frqr into distinct monic irreducibles over F[x] (this is unique, since
factorisation in F[x] is unique). By Cayley-Hamilton as mT |χT we see
for some ni ≥ qi and b(x) ∈ F[x] with b(x) coprime to f1n1 · · · frnr . Since χT and mT have the same
roots over F̄ (Theorem 2.18) we see b(x) has no roots and so must be constant; indeed comparing
leading coefficients b(x) = (−1)n where n = dim(V ).
Note:
Proof. If T is diagonisable then there exists a basis B of eigenvectors for V such that B [T ]B is
diagonal with entries from a list of distinct eigenvalues λ1 , . . . , λr . Then
m(x) = (x − λ1 ) . . . (x − λr )
is annihilating as m(T )v = 0 for any element v ∈ B and hence for any v ∈ V . It is also minimal
as every eigenvalue is a root of the minimal polynomial by Theorem 2.18.
22
S
with Eλi := Ker(T − λi I), is a direct sum decomposition of V into eigenspaces. Taking B = i Bi
with each Bi a basis for Eλi gives a basis of eigenvectors with respect to which T is diagonal.
Example 5.5
P is a projection ⇐⇒ P 2 = P
⇐⇒ P (P − I) = 0.
x ⇒P =0
(x − 1) ⇒P =I !
mP (x) =
0 0
x(x − 1) ⇒ V = E0 ⊕ E 1 , B [P ]B = .
0 I
Example 5.6
1 −1
A= ,
1 1
χA (x) = (1 − x)(1 − x) + 1 = x2 − 2x + 2.
F = R ⇒ mA (x) = χA (x) has no roots,
⇒ A is not triangularisable, nor diagonalisable;
F = C ⇒ mA (x) = χA (x) = (x − (1 + i))(x − (1 − i))
⇒ A is diagonalisable;
F = F5 ⇒ mA (x) = χA (x) = (x − 3)(x − 4)
⇒ A is diagonalisable.
The goal of this chapter is to give a good description of linear transformations when restricted to
the invariant subspaces that occur in the Primary Decomposition Theorem.
23
Theorem 6.1. If T is nilpotent, then its minimal polynomial has the form mT (x) = xm for some
m and there exists a basis B of V such that:
0 ∗ 0
. . . .
. .
B [T ]B =
with each ∗ = 0 or 1.
.
.. ∗
0 0
The proof of this theorem is rather intricate, and best read in parallel with the illustrative example
which follows it.
Proof. As T is nilpotent, T n = 0 for some n, and hence mT (x)|xn . Thus mT (x) = xm for some m.
We have
{0} $ Ker(T ) $ Ker(T 2 ) $ ... $ Ker(T m−1 ) $ Ker(T m ) = V.
By the minimality of m these inclusions are indeed strict as Ker(T k ) = Ker(T k+1 ) implies that
Ker(T k ) = Ker(T k+s ) for all s ≥ 0. (An easy exercise.)
Note that |Bi | must then be dim(Ker(T i )) − dim(Ker(T i−1 )). (We shall shortly make a particular
choice of these sets.)
is a basis for V . More explicitly, considering T |Ker(T m−1 ) and by induction we find that
m−1
[
Bi
i=1
is a basis for Ker(T m−1 ). Now Bm is a basis for the quotient V /Ker(T m−1 ) and so applying
Proposition 3.3 we find that
m−1
[
Bi ∪ Bm
i=1
is a basis for V .
24
is linearly independent in Ker(T i )/ Ker(T i−1 ). (Note here that Bi+1 ⊂ Ker(T i+1 ).) To see why,
write Bi+1 = {w1 , · · · , wt }. Suppose there exists a1 , · · · , at ∈ F with
t
X
as T (ws ) + Ker(T i−1 ) = 0Ker(T i )/Ker(T i−1 ) .
s=1
Then !
t
X
T as ws ∈ Ker(T i−1 )
s=1
and so
t
X
as ws ∈ Ker(T i ).
s=1
Hence
t
X
as ws + Ker(T i ) = 0KerT i+1 /Ker(T i )
s=1
Note that since |Bi+1 | has size dim(Ker(T i+1 )) − dim(Ker(T i )) by our key observation we must
have
for 1 ≤ i ≤ m − 1.
is linearly independent in
Ker(T m−1 ) / Ker(T m−2 ).
Thus we can extend that set to a basis for the quotient Ker(T m−1 ) / Ker(T m−2 ); put another way,
working in Ker(T m−1 ) itself we extend the set T (Bm ) to a set
We now repeat this process of, for i = m − 1, m − 2, · · · , 2, considering the image of T (Bi ) in the
quotient Ker(T i−1 ) / Ker(T i−2 ) (which is linearly independent), and extending T (Bi ) to a set
25
whose image in Ker(T i−1 )/Ker(T i−2 ) is a basis.
With respect to this basis, ordered in this way, we get a block diagonal matrix
Am
..
B [T ]B = . .
A1
with each Ai (m ≥ i ≥ 1) itself a block diagonal matrix consisting of |Ei | many Jordan blocks Ji
of size i × i; here
0 1 0
.. ..
. .
Ji :=
.
. .. 1
0 0
We have
{0} $ Ker(T ) $ Ker(T 2 ) = R3
with
Ker(T ) = h(1, 0, 2)t , (0, 1, 1)t i
and
Ker(T 2 ) / Ker(T ) = h(1, 0, 0)t + Ker(T )i.
Note the dimension jumps here are 2 and 1. So we may choose
B2 = {(1, 0, 0)t } (= E2 )
B1 = T (B2 ) ∪ E1 = {(−2, 14, 10)t , (0, 1, 1)t }
[ [
and B = B1 ∪ B2 = {T (v), v} ∪ {v} = {(−2, 14, 10)t , (1, 0, 0)t , (0, 1, 1)t }
v∈E2 v∈E1
26
Hence
0 1 0
B [T ]B = 0 0 0 .
0 0 0
Proof. T − λI is nilpotent with minimal polynomial xm . We may apply Theorem 6.1. So there
exists a basis B such that B [T − λI]B is block diagonal with blocks Ji and hence
B [T ]B = λI +B [T − λI]B
is of the desired form.
Theorem 6.4. Let V be finite dimensional and let T : V → V be a linear map with minimal
polynomial
mT (x) = (x − λ1 )m1 · · · (x − λr )mr .
Then there exists a basis B of V such that B [T ]B is block diagonal and each diagonal block is of
the form Ji (λj ) for some 1 ≤ i ≤ mj and 1 ≤ j ≤ r.
Note: (1) If F is an algebraically closed field, such as C, then the minimal polynomial will always
split into a product such as in the theorem.
(2) There could be several Ji (λj ) for each pair (i, j) (or none, but there is at least one block for
i = mj for each 1 ≤ j ≤ r, that is, one of maximal size for each eigenvalue). For each 1 ≤ j ≤ r,
the number of Jordan blocks Ji (λj ) for 1 ≤ i ≤ mj is determined by, and determines, the sequence
of dimensions dim Ker((T − λj I)i ) for 1 ≤ i ≤ mj . As this sequence of dimensions depends only
upon T , it follow that the Jordan form is unique, up to the ordering of the blocks.
27
Then χT (x) = det(A − xI) = · · · = (2 − x)3 and mT (x) = (x − 2)3 .
1 0 1
(A − 2I) = −1 −1 −1
0 1 0
1 1 1
(A − 2I)2 = 0 0 0
−1 −1 −1
We have
{0} $ Ker(A − 2I) $ Ker((A − 2I)2 ) $ Ker((A − 2I)3 ) = R3
Note the dimensions increase by exactly one at each step. We choose
Here after choosing B3 we may make no further choices (note E3 = B3 and E2 , E1 are empty). So
we have
[
B = B1 ∪ B2 ∪ B3 = {(T − 2I)2 (v), (T − 2I)(v), v} = {(1, 0, −1)t , (1, −1, 0)t , (1, 0, 0)t }
v∈E3
Put
1 1 1
P = 0 −1 0 .
−1 0 0
Then
2 1 0
P −1 AP = 0 2 1 .
0 0 2
7 Dual Spaces
Linear maps from a vector space to the ground field play a special role. They have a special name,
linear functional, and the collection of all of them form the dual space.
Definition 7.1. Let V be a vector space over F. Its dual V ′ is the vector space of linear maps
from V to F, i.e V ′ = Hom(V, F). Its elements are called linear functionals.
Example 7.2
28
R
(1) Let V = C([0, 1]) be the vector space of continuous functions on [0, 1]. Then, : V → R,
R1
which sends f to 0 f (t) dt is a linear functional:
Z 1 Z 1 Z 1 Z 1
(λf + g)(t) dt = (λf (t) + g(t)) dt = λ f (t) dt + g(t) dt for all f, g ∈ V, λ ∈ R
0 0 0 0
(2) Let V be the vector space of finite sequences, that is, the space
Theorem 7.3. Let V be finite dimensional and let B = {e1 , . . . , en } be a basis for V . Define the
dual e′i of ei (relative to B) by
(
′ 1 if i = j;
ei (ej ) = δij =
0 if i 6= j.
Then B′ := {e′1 , . . . , e′n } is a basis for V ′ , the dual basis. In particular, the assignment ei 7→ e′i
defines an isomorphism of vector spaces. In particular, dim V = dim V ′ .
P ′
Proof. Assume for some ai ∈ F, we have ai ei = 0. Then for all j,
X X
0 = 0(ej ) = ai e′i (ej ) = ai e′i (ej ) = aj
P
Let f ∈ V ′ and put ai := f (ei ). Then f = aj e′j since they evaluate both to ai on ei and any
linear map is determined entirely by its values on any basis. Hence B′ is spanning.
Note though that for v ∈ V the symbol “v ′ ” on its own has no meaning.
Example 7.4
(1) If V = Rn (column vectors) then we may “naturally” identify V ′ with the space (Rn )t of
row vectors. The dual basis of the standard basis of Rn is given by the row vectors e′i =
(0, . . . , 0, 1, 0 . . . , 0) with the 1 at the i-th place. (This identification is “natural” in the sense
that then (e′i (ej )) = e′i ej , and more generally to evaluate a linear functional in V ′ on an element
of V we take the product of the 1 × n vector and n × 1 representing them with respect to the
standard basis and its dual.)
(2) If V is the set of finite sequences then V ′ is the set of infinite sequences b as any f ∈ V ′ is
determined uniquely by its values on a basis. Note that, in this case, V is not isomorphic to
V ′ , which shows that the condition on the dimension in Theorem 7.3 is necessary.
29
Theorem 7.5. Let V be a finite dimensional vector space. Then, V → (V ′ )′ =: V ′′ defined by
v 7→ Ev is a natural linear isomorphism; here Ev is the evaluation map at v defined by
Ev (f ) := f (v) for f ∈ V ′ .
The map v 7→ Ev itself is linear, as Eλv+w = λEv + Ew for all v, w ∈ V and λ ∈ F, since each
functional f is linear.
This map is also injective, since if Ev = 0, then Ev (f ) = f (v) = 0 for all f ∈ V ′ . If v 6= 0, then
we can extend {e1 = v} to a basis B of V . For f = e′1 we then have Ev (e′1 ) = e′1 (e1 ) = 1 which
contradicts the fact Ev (e1 ) = 0. Hence v = 0 which proves that the assignment v 7→ Ev is injective.
By Theorem 7.3,
dim(V ) = dim(V ′ ) = dim(V ′ )′ .
Thus it follows from the injectivity and the Rank-Nullity Theorem that the assignment is also
surjective.
Also note that when c = 0 different choices of b can define the same hyperplane (that is, scaling b
does not change the hyperplane). So different functionals can have the same kernel.
7.1 Annihilators
30
Thus f ∈ V ′ lies in U 0 if and only if f |U = 0.
So f + λg ∈ U 0 . Also, 0 ∈ U 0 and U 0 6= ∅.
Proof. Let {e1 , · · · , em } be a basis for U and extend it to a basis {e1 , · · · , en } for V . Let
{e′1 , · · · , e′n } be the dual basis. As e′j (ei ) = 0 for j = m + 1, · · · , n and i = 1, · · · , m,
e′j ∈ U 0 for j = m + 1, · · · , n.
Pn
Conversely let f ∈ U 0 . Then there exist ai ∈ F such that f = i=1 ai e′i . As ei ∈ U for i = 1, · · · , m,
So f ∈ he′m+1 , · · · , e′n i.
Thus U 0 = he′m+1 , · · · , e′n i, and as this set of spanning vectors is a subset of the dual basis, it is
linear independent. Thus
(1) U ⊆ W ⇒ W 0 ⊆ U 0 ;
(2) (U + W )0 = U 0 ∩ W 0 ;
(3) U 0 + W 0 ⊆ (U ∩ W )0 and equal if dim(V ) is finite.
31
Proof.
(1) f ∈ W 0 ⇒ ∀w ∈ W : f (w) = 0
⇒ ∀u ∈ U ⊆ W : f (u) = 0
⇒ f ∈ U 0.
(2) f ∈ (U + W )0 ⇐⇒ ∀u ∈ U, ∀w ∈ W : f (u + w) = 0
⇐⇒ ∀u ∈ U : f (u) = 0 and ∀w ∈ W : f (w) = 0
⇐⇒ f ∈ U 0 ∩ W 0 .
(3) f ∈ U 0 + W 0 ⇒ ∃ g ∈ U 0 and h ∈ W 0 : f = g + h
⇒ ∀x ∈ U ∩ W : f (x) = g(x) + h(x) = 0
⇒ f ∈ (U ∩ W )0 .
It follows that U 0 + W 0 ⊆ (U ∩ W )0 . If V is finite dimensional, we show that the two spaces have
the same dimension and thus are equal:
Note: For the last part of the proof we also used the formula familiar from Prelims: For U and
W finite dimensional,
Theorem 7.10. Let U be a subspace of a finite dimensional vector space V . Under the natural
map V → V ′′ (:= (V ′ )′ ) given by v 7→ Ev , U is mapped isomorphically to U 00 (:= (U 0 )0 ).
Proof. Let us here write E : V → V ′′ for the natural isomorphism v 7→ Ev . For v ∈ V , the
functional Ev is in U 00 if and only if for all f ∈ U 0 we have Ev (f )(= f (v)) = 0. Hence, if v ∈ U
then Ev ∈ U 00 and thus
U∼ = E(U ) ⊆ U 00 .
When V is finite dimensional, by Theorem 7.8 we also have that
and thus U ∼
= E(U ) = U 00 .
32
Theorem 7.11. Let U ⊆ V be a subspace. Then there exists a natural isomorphism
U 0 ≃ (V /U )′
In the finite dimensional case, considering dimensions of both sides gives the result.
The assignment “V 7→ V ′ ” is functorial in the sense that a map between two spaces gives a map
between the dual spaces (but in the opposite direction).
Definition 7.12. Let T : V → W be a linear map of vector spaces. Define the dual map by
T ′ : W ′ → V ′, f 7→ f ◦ T
This definition is best illustrated by drawing a little triangular diagram (come to the lectures or
draw it yourself).
as required.
33
Theorem 7.14. Let V and W be two finite dimensional vector spaces. The assignment T 7→ T ′
defines a natural isomorphism from hom(V, W ) to hom(W ′ , V ′ ).
Proof. (a little tedious) We first check that the assignment T 7→ T ′ is linear in T . Let T, S ∈
hom(V, W ), λ ∈ F. We need to show (T + λS)′ = T ′ + λS ′ , an identity of maps from W ′ to V ′ . So
let f ∈ W ′ . We now need to show (T + λS)′ (f ) = (T ′ + λS ′ )(f ), an identify of functionals on V .
So (finally!) let v ∈ V . Then we have
and so (T + λS)′ = T ′ + λS ′ .
T ′ (f )(v) := f (T v) =: ET v (f ) = 0.
But then ET v = 0, hence T v = 0 by Theorem 7.5 (applied to W ). Since this is true for all v ∈ V ,
we have T = 0. Thus, the map defined by T 7→ T ′ is injective.
BW [T ]BV = A = (aij ).
Then
m
X
T (ej ) = aij xi and hence x′i (T (ej )) = aij .
i=1
Let
B′V [T ′ ]B′W = B = (bij ).
34
Then
n
X
T ′ (x′i ) = bji e′j and hence (T ′ (x′i ))(ej ) = bji .
j=1
′
By definition we also have (T (x′i ))(ej ) = x′i (T (ej )), and hence aij = bji and At = B.
Notice, in finite dimension, by Theorems 7.5 and 7.11 that (U 0 )′ is naturally isomorphic to the
quotient space V /U . So if you don’t like quotient spaces, you can work instead with duals of
annihilators (!). Challenge: prove the triangular form (Theorem 4.5) using duals of annihilators
instead of quotient spaces. (It is easier in fact just to work with annihilators, and simultaneously
prove a matrix has both an upper and lower triangular form by induction, but the challenge is a
good work-out.) Another good challenge is to figure out the natural isomorphism from V /U to
(U 0 )′ .
Recall from Prelims Geometry and Linear Algebra that the “dot product” on Rn (column vectors)
hv, wi := v t w
hv, wi := v t w.
Note here we conjugate the first vector (whereas we followed the other convention in Prelims and
conjugated the second vector). We’ll call these the usual inner products on Rn and Cn . They
endow these spaces with a notion of length and distance (and angles for Rn ), and we will study
linear maps which behave in certain ways with respect to these notions, e.g., maps which preserve
distance.
F :V ×V →F
We say,
F is symmetric if: F (v, w) = F (w, v) for all v, w ∈ V .
F is non-degenerate if: F (v, w) = 0 for all v ∈ V implies w = 0.
35
Only the last definition is new. When F = R we’ll say F is positive definite if for all v 6= 0 ∈
V : F (v, v) > 0. Note that a positive definite form is always non-degenerate (since F (v, v) cannot
be 0 for v 6= 0).
Example 8.2
where c is the speed of light (that’s fast).√Then F is bilinear, symmetric, non-degenerate, but
not positive definite. For example, v = ( c, 0, 0, 1) 6= 0 but F (v, v) = 0.
If F is positive definite then it is non-degenerate, but this example shows the converse does
not hold.
A real vector space V endowed with a bilinear, symmetric positive definite form F (·, ·) is (as you
know) called an inner product space. We usually write the form as h·, ·i.
F :V ×V →C
and, if so, F (v, v) ∈ R as F (v, v) = F (v, v). We call a conjugate symmetric form F positive
definite if for all v 6= 0 ∈ V : F (v, v) > 0 (note F (v, v) is necessarily real).
36
Example 8.4 Let V = Cn and
F (v, w) = v̄ t Aw
for some A ∈ Mn×n (C). Then F is a sesquilinear form, and F is conjugate symmetric if and only
if A = Āt as for all i, j = 1, · · · , n we have
A is singular
⇐⇒ ∃w ∈ V with w 6= 0 : Aw = 0
⇐⇒ ∃w ∈ V with w 6= 0 s.t. ∀v ∈ V : v̄ t Aw = 0
⇐⇒ F is degenerate (i.e. not non-degenerate).
A complex vector space V with a sesquilinear, conjugate symmetric, positive definite form F = h·, ·i
is (as you know) called a (complex) inner product space.
Given a real or complex inner product space, we say {w1 , · · · , wn } are mutually orthogonal
if hwi , wj i = 0 for all i 6= j, and they are orthonormal if they are mutually orthogonal and
hwi , wi i = 1 for each i.
Proposition 8.5. Let V be an inner product space over K (equal R or C) and {w1 , · · · , wn } ⊂ V
be orthogonal with wi 6= 0 for all i. Then w1 , · · · , wn are linearly independent.
P P
Proof. Assume i λi wi = 0 for some λi ∈ K. Then for all j, hwj , i λi wi i = 0. But
X X
hwj , λ i wi i = λi hwj , wi i = λj hwj , wj i.
i i
37
8.1 Gram-Schmidt orthonormalisation process
w1 = v 1
hw1 , v2 i
w2 = v 2 − w1
hw1 , w1 i
..
.
k−1
X hwi , vk i
wk = v k − wi (∗)
i=1
hwi , wi i
..
.
Assuming that hw1 , · · · , wk−1 i = hv1 , · · · , vk−1 i, the identity (∗) shows that
hwj , vk i
hwj , wk i = hwj , vk i − hwj , wj i = 0.
hwj , wj i
Put p
wi
ui = where kwi k = hwi , wi i.
kwi k
Then E = {ui , · · · , un } is an orthonormal basis.
Corollary 8.6. Every finite dimensional inner product space V over K = R, C has an orthonormal
basis.
Note that the Gram-Schmidt process tells us that given such a V and a subspace U , then any
orthonormal basis for U may be extend to one for V (think about why). This is very important.
38
8.2 Orthogonal complements and duals of inner product spaces
hv, ·i : V → K
w 7→ hv, wi
Theorem 8.7. The map defined by v 7→ hv, ·i is a natural injective R-linear map φ : V → V ′ ,
which is an isomorphism when V is finite dimensional.
Note here that every complex vector space V is in particular a real vector space, and if it is finite
dimensional then
2 dimC V = dimR V.
Proof. Note φ : v 7→ hv, ·i, so and we must first show φ(v + λw) = φ(v) + λφ(w) for all v, w ∈
V, λ ∈ R, i.e.
hv + λw, ·i = hv, ·i + λhw, ·i.
And this is true. So φ is R-linear. (Note it is conjugate linear for λ ∈ C.) As h·, ·i is non-
degenerate, hv, ·i = h·, vi is not the zero functional unless v = 0. Hence, φ is injective. If V is
finite dimensional, then dimR V = dimR V ′ , and hence Im φ = V ′ . Thus, φ is surjective and hence
an R-linear isomorphism.
Definition 8.8. Let U ⊆ V be a subspace of an inner product space V . The orthogonal com-
plement is defined as follows:
3. (U + W )⊥ = U ⊥ ∩ W ⊥
4. (U ∩ W )⊥ ⊇ U ⊥ + W ⊥ (with equality if dim V < ∞)
39
5. U ⊆ (U ⊥ )⊥ (with equality if V is finite dimensional)
Outside of finite dimension, part 2. may fail, and the inclusions in 4. and 5. may be strict (you’ll
see this for 2. and 5. on the problem sheets).
3. Exercise.
4. Exercise.
5. Let u ∈ U . Then, for all w ∈ U ⊥ ,
hu, wi = hw, ui = 0
and hence hw, ui = 0 and u ∈ (U ⊥ )⊥ . If V is finite dimensional, then by 2.
dim((U ⊥ )⊥ ) = dim V − dim U ⊥ = dim U
and so U = (U ⊥ )⊥ .
Proposition 8.11. Let V be finite dimensional. Then, under the R-linear isomorphism φ : V →
V ′ given by v 7→ hv, ·i, the space U ⊥ maps isomorphically to U 0 (considered as R vector spaces).
40
Example 8.12 Let V be the vector space of real polynomials with degree at most two. Define
hf, gi := f (1)g(1) + f (2)g(2) + f (3)g(3).
Then, h , i is bilinear, symmetric and positive definite for:
hf, f i = 0 ⇒ f (1) = f (2) = f (3) = 0
⇒ f is a polynomial of degree ≥ 3 or f = 0.
Since f has degree at most two, f = 0.
We will apply the Gram-Schmidt process to obtain an orthogonal basis for U starting with the
standard basis {1, t}. Put
1
u1 = √ (Note: h1, 1i = 3)
3
w2 = t − ht, u1 iu1
1 1
= t − √ (1 + 2 + 3) √ = t − 2
3 3
w2
u2 =
kw2 k
t−2
= 1
((−1) + 0 + 12 ) 2
2
t−2
= √
2
Hence we can take
1 1 t−2 t−2
f = ht2 , √ i √ + ht2 , √ i √
3 3 2 2
1 1
= (1 + 4 + 9) + (−1 + 0 + 9)(t − 2)
3 2
10
= 4t − .
3
41
Definition 8.13. Given a linear map T : V → V , a linear map T ∗ : V → V is its adjoint if for
all v, w ∈ V ,
hv, T (w)i = hT ∗ (v), wi. (∗)
T ∗ (v) − T̃ (v) = 0,
and so T ∗ = T̃ .
Theorem 8.15. Let T : V → V be linear where V is finite dimensional. Then the adjoint exists
and is linear.
w 7→ hv, T (w)i.
Then hv, T (·)i is a linear functional as T is linear and as h , i is linear in the second coordinate.
As V is finite dimensional, φ : V → V ′ given by φ(u) = hu, ·i is an R-linear isomorphism, and in
particular a surjective map. Thus there exists u ∈ V such that
hv, T (·)i = hT ∗ (v), ·i, i.e., hv, T (w)i = hT ∗ (v), wi for all w ∈ V .
(These equalities have nothing to do with our actual definition of T ∗ , but just follow from the fact
that by construction it satisfies hv, T (w)i = hT ∗ (v), wi for all v, w ∈ V .) As h , i is non-degenerate
(equivalently, as φ is injective)
42
Proposition 8.16. Let T : V → V be linear and let B = {e1 , · · · , en } be an orthonormal basis for
V . Then
∗ t
B [T ]B = (B [T ]B ) .
Note that
(1) Theorem 8.15 is false if V is not finite dimensional (the inner product defines a metric on V ,
and you need assumptions like the map being continuous with respect to this).
(3) For K = R and in finite dimension, under the isomorphism φ : V → V ′ , v 7→ hv, · i, the adjoint
T ∗ is identified with the dual map T ′ , and an orthonormal basis B of V with its dual basis so
that:
′ t ∗
B′ [T ]B′ = (B [T ]B ) = B [T ]B .
(1) (S + T )∗ = S ∗ + T ∗
(2) (λT )∗ = λ̄T ∗
(3) (ST )∗ = T ∗ S ∗
(4) (T ∗ )∗ = T
(5) If mT is the minimal polynomial of T then mT ∗ = mT .
Proof. Exercise.
43
Lemma 8.20. If T is self-adjoint and U ⊆ V is T -invariant, then so is U ⊥ .
Proof. By Lemma 8.19 there exists an eigenvalue λ ∈ R and v 6= 0 such that T (v) = λv. Consider
U = hvi. Then U is T -invariant and by Lemma 8.20 the restriction
Let {e1 , . . . , en } be an orthonormal basis in Kn , where K = R, C, with the usual inner product.
Let A be the matrix with columns ej :
A = [e1 , . . . , en ].
Then
AĀt = Āt A = I and A−1 = Āt ,
that is to say A is orthogonal if K = R and unitary if K = C. More generally:
Definition 8.22. Let V be a finite dimensional inner product space and T : V → V be a linear
transformation. If T ∗ = T −1 then T is called
orthogonal when K = R;
unitary when K = C.
(1) T ∗ = T −1 ;
(2) T preserves inner products: hv, wi = hT v, T wi for all v, w ∈ V ;
(3) T preserves lengths: kvk = kT vk for all v ∈ V.
44
Proof.
Proposition 8.24. The length function determines the inner product: Given two inner products
h , i1 and h , i2 ,
and (for K = C)
hv + iw, v + iwi = hv, vi + ihv, wi − ihv, wi + hw, wi.
Hence,
1
Rehv, wi = (kv + wk2 − kvk2 − kwk2 )
2
1
Imhv, wi = − (kv + iwk2 − kvk2 − kwk2 ).
2
Note that inner product spaces are metric spaces with d(v, w) = kv − wk and orthogonal/unitary
linear transformations are isometries, so we have another equivalence:
45
Definition 8.25. Let
hv, vi = hT v, T vi
| det A| = 1.
Proof. Working over C we know that det A is the product of all eigenvalues (with repetitions).
Hence,
| det A| = |λ1 λ2 · · · λn | = |λ1 ||λ2 | · · · |λn | = 1.
Lemma 8.28. Assume that V is finite dimensional and T : V → V with T ∗ T = Id. Then if U is
T -invariant so is U ⊥ .
hu, T wi = hT ∗ u, wi = hT −1 u, wi.
46
dimension n := dim(V ) and noting that dim U ⊥ = n − 1, we may assume that there exists an
v
orthonormal basis {e2 , · · · , en } of U ⊥ . Put e1 = kvk . Then {e1 , e2 , · · · , en } is an orthonormal
basis of eigenvectors for V .
Corollary 8.30. Let A ∈ U (n). Then there exists P ∈ U (n) such that P −1 AP is diagonal.
Note that if A ∈ O(n) then A ∈ U (n) but A may not be diagonalisable over the reals!
Theorem 8.32. Let T : V → V be orthogonal and V be a finite dimensional real vector space.
Then there exists an orthonormal basis B such that:
I
−I
R θ1
B [T ]B = 6 0, π.
θi =
..
.
R θℓ
47
decomposes into orthogonal eigenspaces of S with distinct eigenvalues λ1 , . . . , λk . Note that each
Vi is also T -invariant as for v ∈ Vi
If λi 6= ±2 then T |Vi does not have any real eigenvalues as they would have to be ±1 by Lemma
8.26 (with product +1) forcing λi = ±2. So {v, T (v)} are linearly independent over the reals for
v 6= 0 ∈ Vi . Consider the plane W = hv, T (v)i spanned by v and T v. Then W is T -invariant as
Hence W ⊥ is also T -invariant by Lemma 8.28. Repeating the argument for T |W ⊥ if necessary, we
see that Vi splits into 2-dimensional T -invariant subspaces. By our Example 8.31, with respect to
some orthonormal basis of W
cos(θ) − sin(θ)
T |W = R θ =
sin(θ) cos(θ)
for some θ 6= 0, π. (Note, the fact that TW does not have any real eigenvalues implies that TW is
not a reflection and θ 6= 0, π.)
U 0 + W 0 ⊆ (U ∩ W )0 .
U ⊥ + W ⊥ ⊆ (U ∩ W )⊥ .
This latter inclusion may be strict outside of finite dimension.5 So what about the former inclusion?
Let’s try and prove the reverse inclusion (U ∩ W )0 ⊆ U 0 + W 0 directly in the finite dimensional
case (rather than appealing to a dimension argument).
5 Thank you to David Seifert for an example, and helpful discussions around this appendix.
48
So let f ∈ (U ∩ W )0 , that is, f : V → F with f |U ∩W = 0. We need to find g ∈ U 0 and h ∈ W 0
with f = g + h.
(U ∩ W ) ⊕ X = U.
We can do this by taking a basis for U ∩ W and extending it to one for U . We call this a direct
complement for U ∩ W in U . Likewise we find Y ⊆ W with
(U ∩ W ) ⊕ Y = W.
((U ∩ W ) ⊕ X ⊕ Y ) ⊕ Z = V.
Now we define g, h : V → F by giving them on each summand in this direct sum decomposition:
U ∩W X Y Z
g f /2 0 f f /2
h f /2 f 0 f /2
Then indeed g + h = f and g|U = 0 and h|W = 0 (note if 2 is not invertible in F this “symmetric”
construction can be easily modified).
Our proof does not mention dimensions. But we do use finite dimensionality, extending bases for
a subspace to the whole space (to show every subspace has a direct complement). Can this be
done outside of finite dimension too? Well yes, if we assume something called Zorn’s Lemma: this
is an axiom in mathematics which is (probably) not necessary for most of, for example, my own
subject number theory (and one which many mathematicians try to avoid). But it seems to be
unavoidable in certain contexts.
49