Linear Algebra
Linear Algebra
DR
AF
T
Contents
1 Introduction to Matrices 7
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Definition of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1 Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 Transpose and Conjugate Transpose of Matrices . . . . . . . . . . . . . . 11
1.3.2 Sum and Scalar Multiplication of Matrices . . . . . . . . . . . . . . . . . . 11
1.3.3 Multiplication of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.4 Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4 Some More Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
T
1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
DR
3
4 CONTENTS
3 Vector Spaces 65
3.1 Vector Spaces: Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . 65
3.1.1 Vector Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2 Linear Combination and Linear Span . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.2.1 Linear Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.3.1 Basic Results on Linear Independence . . . . . . . . . . . . . . . . . . . . 79
3.3.2 Application to Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.4 Basis of a Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.4.1 Main Results associated with Bases . . . . . . . . . . . . . . . . . . . . . 86
3.4.2 Constructing a Basis of a Finite Dimensional Vector Space . . . . . . . . 86
3.5 Fundamental Subspaces Associated with a Matrix . . . . . . . . . . . . . . . . . 88
3.6 Fundamental Theorem of Linear Algebra and Applications . . . . . . . . . . . . . 90
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
9 Appendix 223
9.1 Uniqueness of RREF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
9.2 Permutation/Symmetric Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
9.3 Properties of Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
9.4 Dimension of W1 + W2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
9.5 When does Norm imply Inner Product . . . . . . . . . . . . . . . . . . . . . . . . 234
9.6 Roots of a Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
9.7 Variational characterizations of Hermitian Matrices . . . . . . . . . . . . . . . . . 237
Index 241
6 CONTENTS
T
AF
DR
Chapter 1
Introduction to Matrices
1.1 Motivation
Recall that at some stage, we have solved a linear system of 3 equations in 3 unknowns. But,
for clarity, let us start with a few linear systems of 2 equations in 2 unknowns.
The two linear systems represent a pair of non-parallel lines in R2 . Note that x = 1, y = 1
is the unique solution of the given system as (1, 1) is the point of intersection of the two
DR
Here, we have three planes in R3 and an easy observation implies that the third equation
is the sum of the first two equations. Hence, the line of intersection of the first two planes
7
8 CHAPTER 1. INTRODUCTION TO MATRICES
is contained in the third plane. Hence, this system has infinite number of solutions given
by
x = 61 − 59k, y = −10 + 11k, z = k with k arbitrary real number.
For example, verify that for k = 1, we get x = 2, y = 1 and z = 1 as a possible solution.
Also,
1 5 4 11 1 5 4
1 · 2 + 6 · 1 + −7 · 1 = 1 = 1 · 61 + 6 · (−10) + −7 · 0,
2 11 −3 12 2 11 −3
2 11 −3 0
AF
x + 5y + 4z = 11
x + 6y − 7z = 1 (1.1.4)
2x + 11y − 3z = 13.
Here, we see that if we add the first two equations and subtract it with the third equation
then we are left with 0x + 0y + 0z = 1, which has no solution. That is, the above system
has no solution. I leave it to the readers to verify that there does not exist any x, y and
z such that
1 5 4 11
1 · x + 6 · y + −7 · z = 1 .
2 11 −3 13
Remark 1.1.2. So, what we see above is “each of the linear systems gives us certain ‘relation-
ships’ between vectors which are ‘associated’ with the unknowns”. These relationships will lead
to the study of certain objects when we study “vector spaces”. They are as follows:
1. The first idea of ‘relationship’ that helps us to write a vector "in#terms of other vectors will
7
lead us to the study of ’linear combination’ of vectors. So, is a ‘linear combination’
6
" # " # 11 1 5 4
2 5
of and 1 is a ‘linear combination’ of 1, 6 and −7.
. Similarly,
2 4
12 2 11 −3
1.2. DEFINITION OF A MATRIX 9
2. Further, it also leads to the study of ‘linear span’ of a set. A positive answer leads to the
vector being an element of the ‘linear span;
and
a negative
answer
to ‘NOT
an element of
1 5 4 11
the linear span’. For example, for S = 1, 6 , −7, the vector 1 belongs to
2 11 −3 12
11
the ‘linear span’ of S, whereas, 1
does NOT belong to the ‘linear span’ of S.
13
3. The idea of a unique solution leads us to the statement
(" # " #) that the corresponding vectors are
2 5
‘linearly independent’. For example, the set , ⊆ R2 is ‘linearly independent’.
2 4
1 5 4
Whereas, the set 1 , 6 , −7 ⊆ R3 is NOT ‘linearly independent’ as
2 11 −3
1 5 4 0
1 · (−59) + 6 · 11 + −7 · 1 = 0.
2 11 −3 0
The horizontal arrays of a matrix are called its rows and the vertical arrays are called its
columns. A matrix A having m rows and n columns is said to be a matrix of size/ order
m × n and can be represented in either of the following forms:
a11 a12 · · · a1n a11 a12 · · · a1n
a21 a22 · · · a2n a21 a22 · · · a2n
A= . or A = . ,
.. .. .. .. . .. .. ..
. . .
. . . .
am1 am2 · · · amn am1 am2 · · · amn
where aij is the entry at the intersection of the ith row and j th column. One writes A ∈ Mm,n (F)
to mean that A is an m×n matrix with entries from the set F, or in short A = [aij ] or A = (aij ).
We write A[i, :] to denote the i-th row of A, A[:, j] to denote the j-th column of A and aij or
(A)ij or A[i, j], for the (i,"j)-th entry of A.# " #
1 3+i 7 7
For example, if A = then A[1, :] = [1 3 + i 7], A[:, 3] = and
4 5 6 − 5i 6 − 5i
a22 = 5. Sometimes commas are inserted to differentiate between entries of a row vector. Thus,
A[1, :] may also be written as [1, 3 + i, 7]. A matrix having only one column is called a column
vector and a matrix with only one row is called a row vector. All our vectors will be column
vectors and will be represented by bold letters. A matrix of size 1 × 1 is also called a scalar
and is treated as such and hence we may or may not put it under brackets.
10 CHAPTER 1. INTRODUCTION TO MATRICES
Definition 1.2.2. Two matrices A = [aij ], B = [bij ] ∈ Mm,n (C) are said to be equal if aij = bij ,
for each i = 1, 2, . . . , m and j = 1, 2, . . . , n.
In other words, two matrices are said to be equal if they have the same order and their
corresponding entries are equal.
Example 1.2.3. 1. Consider a system of linear
" equations
# 2x + 5y ="7#and 3x + 2y =" 6. #
2 5 7 2 5
Then, we identify it with the matrix A = . Here, A[:, 1] = and A[:, 2] =
3 2 6 3 2
are associated with the variables/ unknowns x and y, respectively.
" # " # 0 0
0 0 0 1
2. A = ,B = Then, A 6= B as a12 6= b12 . Similarly, if C = 0 0 then
0 0 0 0
0 0
A 6= C as they are of different sizes.
3. Let A ∈ Mn (F).
(a) Then, the entries a11 , a22 , . . . , ann are called the diagonal entries of A. They consti-
tute the principal diagonal of A.
(b) Then, A is said to be a diagonal matrix, , denoted
" #diag(a11 , . . . , ann ), if aij = 0
4 0
for i 6= j. For example, the zero matrix 0n and are diagonal matrices.
0 1
(c) Then, A = diag(1, . . . , 1) is called the matrix, denoted In , or in short I.
identity
" # 1 0 0
1 0
For example, I2 = and I3 = 0 1 0
.
0 1
0 0 1
(d) If A = αI, for some α ∈ F, then A is called a scalar matrix.
(e) Then, A is said to be an upper triangular matrix if aij = 0 for i > j.
(f) Then, A is said to be a lower triangular matrix if aij = 0 for i < j.
(g) Then, A is said to be triangular if it is an upper or a lower triangular matrix.
0 1 4 0 0 0
For example, 0 3 −1 is upper triangular, 1 0 0 is lower triangular and the
0 0 −2 0 1 1
matrices 0, I are upper as well as lower triangular matrices.
1.3. MATRIX OPERATIONS 11
5. For 1 ≤ i ≤ n, define ei = In [:, i], a matrix of order n × 1. Then the column matrices
e1 , . . . , en are called the standard unit vectors or the standard basis of Mn,1 (C) or
Cn . The dependence of n is omitted as it is understood
from the context. For example,
" # 1
1
if e1 ∈ C then, e1 =
2 and if e1 ∈ C then e1 = 0
3
.
0
0
1. the transpose of A, denoted AT , is an n × m matrix with (AT )ij = aji , for all i, j.
2. the conjugate transpose of A, denoted A∗ , is an n × m matrix with (A∗ )ij = aji (the
complex-conjugate of aji ), for all i, j.
" # " # " #
1 4+i 1 0 ∗ 1 0
If A = T
then A = and A = . Note that A∗ 6= AT .
0 1−i 4+i 1−i 4−i 1+i
" #
1 h i
Note that if x = is a column vector then xT = 1 2 and x∗ are row vectors.
2
Proof. Let A = [aij ], A∗ = [bij ] and (A∗ )∗ = [cij ]. Clearly, the order of A and (A∗ )∗ is the
same. Also, by definition cij = bji = aij = aij for all i, j.
1. A + B = B + A (commutativity).
2. (A + B) + C = A + (B + C) (associativity).
3. k(`A) = (k`)A.
4. (k + `)A = kA + `A.
as complex numbers commute. The other parts are left for the reader.
3. Let A ∈ Mn (C). Then there exists matrices B and C such that A = B +C, where B T = B
(Symmetric matrix) and C T = −C (skew-symmetric matrix).
1 + i −1 " #
2 3 −1 ∗ ∗
4. Let A =
2 3 and B = 1 1 − i 2 . Compute A + B and B + A .
i 1
We now come to the most important operation between matrices, called the matrix multipli-
cation. We define it as follows.
Definition 1.3.8. Let A = [aij ] ∈ Mm,n (C) and B = [bij ] ∈ Mn,r (C). Then, the product of A
and B, denoted AB, is a matrix C = [cij ] ∈ Mm,r (C) such that for 1 ≤ i ≤ m, 1 ≤ j ≤ r
b1j
n
b2j X
cij = A[i, :]B[:, j] = [ai1 , ai2 , . . . , ain ] . = ai1 b1j + ai2 b2j + · · · + ain bnj = aik bkj .
..
k=1
bnj
Thus, AB is defined if and only if the number of columns of A = the number of rows of
B. The way matrix product is defined seems quite complicated. Most of you have already seen
it. But, we will find other ways (3 more ways) to understand this matrix multiplication. These
will be quite useful at different stages in our study. So, we need to spend enough time on it.
1 −1 " #
3 4 5
T
Example 1.3.9. Let A =
2 0 and B = −1 0 1 .
AF
0 1
DR
2. Row Method: Note that A[1, :] is a 1 × 2 matrix and B is a 2 × 3 matrix and hence
A[1, :]B is a 1 × 3 matrix. So, matrix multiplication is defined. Thus,
" #
h i 3 4 5 h i h i h i
A[1, :]B = 1 −1 = 1 · 3 4 5 + (−1) · −1 0 1 = 4 4 4
−1 0 1
" #
h i 3 4 5 h i h i h i
A[2, :]B = 2 0 = 2 · 3 4 5 + 0 · −1 0 1 = 6 8 10
−1 0 1
" #
h i 3 4 5 h i h i h i
A[3, :]B = 0 1 = 0 · 3 4 5 + 1 · −1 0 1 = −1 0 1 .
−1 0 1
A[1, :] A[1, :] A[1, :]B 4 4 4
Hence, if A =
A[2, :] then AB = A[2, :]B = A[2, :]B = 6 8 10.
A[3, :] A[3, :] A[3, :]B −1 0 1
3. Column Method: Note that A is a 3 × 2 matrix and B[:, 1] is a 2 × 1 matrix and hence
14 CHAPTER 1. INTRODUCTION TO MATRICES
" #
h i B[1, :]
4. Matrix Method: We also have if A = A[:, 1] A[:, 2] and B = then A[:, 1]
T
B[2, :]
AF
is a 3 × 1 matrix and B[1, :] is a 1 × 3 matrix. Thus, the matrix product A[:, 1] B[1, :] is
DR
Remark 1.3.10. Let A ∈ Mm,n (C) and B ∈ Mn,p (C). Then the product AB is defined and
observe the following:
B[1, :]
h i .
4. Write A = A[:, 1] · · · A[:, n] and B = .
. . Then
B[n, :]
AF
h i −1 2 3
−1 2 3 then AB = −2 4 6 whereas BA = −1 + 4 + 9 = 12. As matrices, they
DR
−3 6 9
look quite different but it will be shown during the study of eigenvalues and eigenvectors
that they have similar structure.
(A + B)2 = A2 + AB + BA + B 2 6= A2 + B 2 + 2AB.
" # " #
2 1 3 3
Whereas if C = then BC = CB = = 3A 6= A = CA. Note that cancella-
1 2 3 3
tion laws don’t hold.
Definition 1.3.11. Two square matrices A and B are said to commute if AB = BA.
Theorem 1.3.12. Let A ∈ Mm,n (C), B ∈ Mn,p (C) and C ∈ Mp,q (C).
Using a similar argument, the next part follows. The other parts are left for the reader.
Exercise 1.3.13. 1. Let A ∈ Mn (C) and e1 , . . . , en ∈ Mn,1 (C) (see Definition 5). Then
(a) Ae1 = A[:, 1], . . . , Aen = A[:, n].
(b) eT1 A = e∗1 A = A[1, :], . . . , eTn A = e∗n A = A[n, :].
5. Let A ∈ Mm,n (C). If Ax = 0 for all x ∈ Mn,1 (C) then A = 0, the zero matrix.
6. Let A, B ∈ Mm,n (C). If Ax = Bx, for all x ∈ Mn,1 (C) then prove that A = B.
x1 y1
. . n n
.. , y = .. ∈ Mn,1 (C). Then y∗ x = ∗x = |xi |2 ,
P P
7. Let x =
y i x i , x
i=1 i=1
xn yn
|x1 |2 x1 x2 · · · x1 xn
x1 y1 x1 y2 · · · x1 yn
x2 x1 |x2 |2 · · · x2 xn
.. .. ..
xy∗ = ∗
. and xx = .
. . ··· ... .
.. . .. .
..
xn y1 xn y2 · · · xn yn
xn x1 xn x2 · · · |xn | 2
Note that (A − aI3 )[:, 1] = 0. So, if A[:, 1] = 0 then B[1, :] doesn’t play any role in AB.
T
AF
DR
Lemma 1.3.15. Let A ∈ Mn (C). If there exist B, C ∈ Mn (C) such that AB = In and CA = In
then B = C, i.e., If A has a left inverse and a right inverse then they are equal.
Remark 1.3.16. Lemma 1.3.15 implies that whenever A is invertible, the inverse is unique.
Thus, we denote the inverse of A by A−1 . That is, AA−1 = A−1 A = I.
1. (A−1 )−1 = A.
2. (AB)−1 = B −1 A−1 .
" # " #
cos(θ) sin(θ) cos(θ) − sin(θ)
3. Find the inverse of and .
DR
" #
1 2
5. Determine A that satisfies (I + 3A)−1 = .
2 1
1 2
6. Let A be an invertible matrix satisfying A3 + A − 2I = 0. Then A−1 =
A +I .
2
7. Let A = [aij ] be an invertible matrix and B = [pi−j aij ], for some p ∈ C, p 6= 0. Then
B −1 = [pi−j (A−1 )ij ].
1.4. SOME MORE SPECIAL MATRICES 19
Then, the matrices ek` for 1 ≤ k ≤ m and 1 ≤ ` ≤ n are called the standard basis
elements for Mm,n (C).
" " # # " # " #
1 0 0
1 h i 0 1 0 1 h i
So, if ek` ∈ M2,3 (C) then e11 = = 1 0 0 , e12 = = 0 1 0
0 0 0 0 0 0 0 0
" # " #
0 0 0 0 h i
and e22 = = 0 1 0 .
0 1 0 1
In particular, if eij ∈ Mn (C) then eij = ei eTj = ei e∗j , for 1 ≤ i, j ≤ n.
" #
1 1 1
AF
(d) A is said to be a permutation matrix if A has exactly one non-zero entry, "namely
#
0 1
1, in each row and column. For example, In for each positive integer n, ,
1 0
0 1 0 0 0 1 0 1 0
0 0 1, 0 1 0 and 1 0 0 are permutation matrices. Verify that per-
1 0 0 1 0 0 0 0 1
mutation matrices are Orthogonal matrices.
6. An idempotent matrix which is also Hermitian is called a projection matrix. For example,
if u ∈ Mn,1 (C) is a unit vector then A = uu∗ is a Hermitian, idempotent matrix. Thus A
is a projection matrix.
In particular, if u ∈ Mn,1 (R) is a unit vector then A = uuT . Then verify that uT (x−Ax) =
uT x − uT Ax = uT x − uT (uuT )x = 0 (as uT u = 1), for any x ∈ R3 . Thus, with respect
to the dot product in R3 , Ax is the foot of the perpendicular from the point x on the
1
vector u. In particular, if u = √ [1, 2, −1]T and A = uuT . Then, for any vector
6
x = [x1 , x2 , x3 ]T ∈ M3,1 (R),
x1 + 2x2 − x3 x1 + 2x2 − x3
Ax = (uuT )x = u(uT x) = √ u= [1, 2, −1]T .
6 6
7. Fix a unit vector u ∈ Mn,1 (R) and let A = 2uuT − In . Then, verify that A ∈ Mn (R) and
Ay = 2(uT y)u − y, for all y ∈ Rn . This matrix is called the reflection matrix about the
line, say `, containing the points 0 and u. This matrix fixes each point on the line ` and
send the vector v, which is orthogonal to u, to −v.
Exercise 1.4.2. 1. Consider the matrices eij ∈ Mn (C) for 1 ≤ i, j, ≤ n. Is e12 e11 = e11 e12 ?
What about e12 e22 and e22 e12 ?
2. Let {u1 , u2 , u3 } be three vectors in R3 such that u∗i ui = 1, for 1 ≤ i ≤ 3, and u∗i uj = 0
whenever i 6= j. Prove the following.
4. Prove that in M5 (R), there are infinitely many orthogonal matrices of which only finitely
many are diagonal (in fact, there number is just 32).
6. Let A, B ∈ Mn (C) be two unitary matrices. Then both AB and BA are unitary matrices.
8. Let A ∈ Mn (C). If x∗ Ax ∈ R for every x ∈ Mn,1 (C) then A is a Hermitian matrix. [Hint:
Use ej , ej + ek and ej + iek of Mn,1 (C) for x.]
11. Let A ∈ Mn (C). Then A = S1 +S2 , where S1 = 21 (A+A∗ ) is Hermitian and S2 = 21 (A−A∗ )
is skew-Hermitian.
14. Let A = 0 cos θ − sin θ and B = 0 cos θ , for θ ∈ [−π, π). Are they
sin θ
0 sin θ cos θ 0 sin θ − cos θ
DR
orthogonal?
Let A ∈ Mn,m (C) and B ∈ Mm,p (C). Then the product AB" is
# defined. Suppose r < m.
H
Then A and B can be decomposed as A = [P Q] and B = , where P ∈ Mn,r (C) and
K
H ∈ Mr,p (C) so that AB = P H + QK. This is proved next.
AB = P H + QK.
Proof. Verify that the matrix products P H and QK are valid. Further, their sum is defined
as P H, QK ∈ Mn,p (C). Now, let P = [Pij ], Q = [Qij ], H = [Hij ], and K = [Kij ]. Then, for
1 ≤ i ≤ n and 1 ≤ j ≤ p, we have
m
X r
X m
X r
X m
X
(AB)ij = aik bkj = aik bkj + aik bkj = Pik Hkj + Qik Kkj
k=1 k=1 k=r+1 k=1 k=r+1
T
Remark 1.5.4. Theorem 1.5.3 is very useful due to the following reasons:
1. The matrices P, Q, H and K can be further partitioned so as to form blocks that are either
identity or zero or have certain nice properties. So, such partitions are useful during
different matrix operations.
" # Examples of such partitions
" # appear throughout the notes. For
Ir 0 h i Q1
example, let A = , P = P1 P2 and Q = . Then, verify that P AQ = P1 Q1 .
0 0 Q2
This is similar to the understanding that
" #" #
i a
11 a12 y1
h
x1 x2 = x1 a11 y1 + x1 a12 y2 + x2 a21 y1 + x2 a22 y2 .
a21 a22 y2
2. Suppose one wants to prove a result for a square matrix A. If we want to prove it using
induction then we can prove it for the 1 × 1 matrix (the initial step of induction). Then
assume the result to hold for all k × k sub-matrices
" # A or just the first k × k principal
of
B x
sub-matrix of A. At the next step write A = T , where B is a k × k matrix. Then
x a
the result holds for B and then one can proceed to prove it for A.
4. For An×n = [aij ], the trace of A, denoted tr(A), is defined by tr(A) = a11 + a22 + · · · + ann .
" # " #
3 2 4 −3
(a) Compute tr(A) for A = and A = .
2 2 −5 1
" # " # " # " #
1 1 1 1
(b) Let A be a matrix with A =2 and A =3 . Determine tr(A)?
2 2 −2 −2
(c) Let A and B be two square matrices of the same order. Then
T
(d) Does there exist matrices A, B ∈ Mn (C) such that AB − BA = cI, for some c 6= 0?
(a) Verify that J = 11T , where 1 is a column vector having all entries 1.
(b) Verify that J 2 = nJ.
(c) Also, for any α1 , α2 , β1 , β2 ∈ R, verify that there exist α3 , β3 ∈ R such that
(α1 In + β1 J) · (α2 In + β2 J) = α3 In + β3 J.
(d) Let α, β ∈ R such that α 6= 0 and α + nβ 6= 0. Now, define A = αIn + βJ. Then,
use the above to prove that A is invertible.
6. Suppose the matrices B and C are invertible and the involved partitioned products are
defined, then verify that that
" #−1 " #
A B 0 C −1
= .
C 0 B −1 −B −1 AC −1
" #
A11 x
7. Let A = , where A11 ∈ Mn (C) is invertible and c ∈ C.
y∗ c
24 CHAPTER 1. INTRODUCTION TO MATRICES
(a) If p = c − y∗ A−1
11 x is non zero, then verify that
" # " #
−1 −1
A 0 1 A x h i
A−1 = 11
+ 11
y∗ A−1
11 −1 .
0 0 p −1
0 −1 2 0 −1 2
(b) Use the above to find the inverse of
1 1 4
and 3
1 4
.
−2 1 1 −2 5 −3
9. Let A ∈ Mn (R) be an invertible matrix and let x, y ∈ Mn,1 (R). Also, let β ∈ R such that
α = 1 + βyT A−1 x 6= 0. Then, verify the famous Shermon-Morrison formula
β −1 T −1
(A + βxyT )−1 = A−1 − A xy A .
α
This formula gives the information about the inverse when an invertible matrix is modified
T
10. Let A ∈ Mm,n (C). Then, a matrix G ∈ Mn,m (C) is called a generalized inverse (for
DR
1.6 Summary
In this chapter, we started with the definition of a matrix and came across lots of examples.
We recall these examples as they will be used in later chapters to relate different ideas:
3. Triangular matrices.
1.6. SUMMARY 25
4. Hermitian/Symmetric matrices.
5. Skew-Hermitian/skew-symmetric matrices.
6. Unitary/Orthogonal matrices.
7. Idempotent matrices.
8. Nilpotent matrices.
We also learnt product of two matrices. Even though it seemed complicated, it basically
tells that multiplying by a matrix on the
The matrix multiplication is not commutative. We also defined the inverse of a matrix. Further,
there were exercises that informs us that the rows and columns of invertible matrices cannot
have certain properties.
T
AF
DR
26 CHAPTER 1. INTRODUCTION TO MATRICES
T
AF
DR
Chapter 2
2.1 Introduction
We start this section with our understanding of the system of linear equations.
2. Recall that the linear system ax + by = c for (a, b) 6= (0, 0), in the variables x and y,
represents a line in R2 . So, let us consider the points of intersection of the two lines
a1 x + b1 y = c1 , a2 x + b2 y = c2 , (2.1.1)
where a1 , a2 , b1 , b2 , c1 , c2 ∈ R with (a1 , b1 ), (a2 , b2 ) 6= (0, 0) (see Figure 2.1 for illustration
of different cases).
❵✶
❵✶
❵✷ ❵✶ ✄☎❞ ❵✷
✝ ❵✷
◆♦ ❙♦❧ t✐♦♥ ■♥☞♥✐t✂ ◆ ♠❜✂✁ ♦❢ ❙♦❧ t✐♦♥s ❯♥✐✞ ✂ ❙♦❧ t✐♦♥✿ ■♥t✂✁s✂❝t✐♥❣ ▲✐♥✂s
P❛✐✁ ♦❢ P❛✁❛❧❧✂❧ ❧✐♥✂s ❈♦✐♥❝✐✆✂♥t ▲✐♥✂s ✟ ✿ P♦✐♥t ♦❢ ■♥t✂✁s✂❝t✐♦♥
(a) Unique
" # Solution
" # (a1 b2 − a2 b1 6= 0): The linear system x − y = 3 and 2x + 3y = 11
x 4
has = as the unique solution.
y 1
27
28 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
" # 2.1.2. Observe the following of the linear system in Example 2.1.1.2a.
Example
4
1. corresponds to the point of intersection of the corresponding two lines.
1
T
" #
1 −1
2. Using matrix multiplication, the given system equals Ax = b, where A = ,
AF
2 3
DR
Thus, there are three ways of looking at the linear system Ax = b, where, as the name
suggests, one of the ways is looking at the point of intersection of planes, the other is the vector
sum approach and the third is the matrix multiplication approach. We will see that all the
three approaches are fundamental to the understanding of linear algebra.
where for 1 ≤ i ≤ m and 1 ≤ j ≤ n; aij , bi ∈ R. The linear system (2.1.2) is called homoge-
neous if b1 = 0 = b2 = · · · = bm and non-homogeneous, otherwise.
2.1. INTRODUCTION 29
a11 a12 ··· a1n
x1 b1
a21 a22 · · · a2n . .
Definition 2.1.4. Let A = . ,x= . .
. and b = . . Then, Equa-
.. .. .. ..
. . .
xn bm
am1 am2 · · · amn
tion (2.1.2) can be re-written as Ax = b, where A is called the coefficient matrix and the
block matrix [A b] is called the augmented matrix .
AF
1 1 1 1 0
For example, Ax = b, with A = 1 4 2 and b = 0 has −1 as the solution set.
DR
4 1 1 1 2
" # " # (" #)
1 1 2 1
Similarly, A = and b = has as the solution set. Further, they are consistent
1 2 3 1
systems. Whereas, the system x + y = 2, 2x + 2y = 3 is inconsistent (has no solution).
Definition 2.1.6. For the linear system Ax = b the corresponding linear homogeneous system
Ax = 0 is called the associated homogeneous system.
The readers are advised to supply the proof of the next remark.
Remark 2.1.7. Consider the linear system Ax = b with two distinct solutions, say u and v.
2. Thus, any two distinct solutions of Ax = b differs by a solution of the associated homoge-
neous system Ax = 0, i.e., {x0 + xh } is the solution set of Ax = b with x0 as a particular
solution and xh , a solution of the associated homogeneous system Ax = 0.
2. Give a linear system of 3 equations in 2 variables such that the system is inconsistent
whereas it has 2 equations which form a consistent system.
3. Give a linear system of 4 equations in 3 variables such that the system is inconsistent
whereas it has three equations which form a consistent system.
To proceed with the understanding of the solution set of a system of linear equations, we start
with the definition of a pivot.
Definition 2.2.1. Let A be a non-zero matrix. Then, in each non-zero row of A, the left most
non-zero entry is called a pivot/leading entry. The column containing the pivot is called a
pivotal column.
If aij is a pivot then we denote it by aij . For example, the entries a12 and a23 are pivots
0 3 4 2
in A = 0 0 0 0. Thus, columns 2 and 3 are pivotal columns.
0 0 2 1
Definition 2.2.2. A matrix is in row echelon form (REF) (staircase/ ladder like)
2. if the pivot of the (i + 1)-th row, if it exists, comes to the right of the pivot of the i-th
row.
We now start with solving two systems of linear equations. The idea is to manipulate the
rows of the augmented matrix in place of the linear equations themselves. Since, multiplying
a matrix on the left corresponds to row operations, we left multiply by certain matrices to
the augmented matrix so that the final matrix is in row echelon form (REF). The process of
obtaining the REF of a matrix is called the Gauss Elimination method. The readers should
carefully look at the matrices being multiplied on the left in the examples given below.
T
0 1 1 2
DR
(a) Interchange 1-st and 2-nd equations (interchange B0 [1, :] and B0 [2, :] to get B1 ).
2x + 3z = 5 0 1 0 2 0 3 5
y+z =2 B1 = 1 0 0B0 = 0 1 1 2 .
x+y+z =3 0 0 1 1 1 1 3
1
(b) In the new system, replace 3-rd equation by 3-rd equation minus times the 1-st
2
1
equation (replace B1 [3, :] by B1 [3, :] − B1 [1, :] to get B2 ).
2
2x + 3z = 5 1 0 0 2 0 3 5
y+z =2 B2 = 0 1 0 B1 = 0
1 1 2 .
1 1
y − 2z = 2 −1/2 0 1 0 1 −1/2 1/2
(c) In the new system, replace 3-rd equation by 3-rd equation minus 2-nd equation
(replace B2 [3, :] by B2 [3, :] − B2 [2, :] to get B3 ).
2x + 3z =5 1 0 0 2 0 3 5
y+z =2 B3 = 0 1 0B2 = 0
1 1 2 .
3 3 0 0 -3/2 −3/2
−2z = −2 0 −1 1
32 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
Observe that the matrix B3 is in REF. Using the last row of B3 , we get z = 1. Using
this and the second row of B3 gives y = 1. Finally, the first row gives x = 1. Hence,
the solution set of Ax = b is {[x, y, z]T | [x, y, z] = [1, 1, 1]}, a unique solution. The
method of finding the values of the unknowns y and x, using the 2-nd and 1-st row of B3
and the value of z is called back substitution.
2. Solve the linear system x + y +
z = 4, 2x +
3z = 5, y + z = 3.
1 1 1 4
Solution: Let B0 = [A b] = 2 0 3 5
be the augmented matrix. Then
0 1 1 3
(a) The given system looks like (correspond to the augment matrix B0 ).
x+y+z =4 1 1 1 4
2x + 3z = 5 2 0 3 5 .
B0 =
y+z =3 0 1 1 3
(b) In the new system, replace 2-nd equation by 2-nd equation minus 2 times the 1-st
equation (replace B0 [2, :] by B0 [2, :] − 2 · B0 [1, :] to get B1 ).
x+y+z = 4 1 0 0 1 1 1 4
−2y + z = −3 B1 = −2 1 0B0 = 0
-2 .
1 −3
T
y+z = 3 0 0 1 0 1 1 3
AF
DR
(c) In the new system, replace 3-rd equation by 3-rd equation plus 1/2 times the 2-nd
equation (replace B1 [3, :] by B1 [3, :] + 1/2 · B1 [2, :] to get B2 ).
x+y+z = 4 1 0 0 1 1 1 4
−2y + z = −3
B2 = 0 1 0B1 = 0
-2 1 −3
.
3 3
z= 0 1/2 1 0 0 3/2 3/2
2 2
Observe that the matrix B2 is in REF. Verify that the solution set is {[x, y, z]T | [x, y, z] =
[1, 2, 1]}, again a unique solution.
5. Also, for each matrix E note that we have a matrix F , again a variant of I3 such that
EF = I3 = F E.
We use the above ideas to define elementary row operations and the corresponding elemen-
tary matrices in the next subsection.
Definition 2.2.5. Let A ∈ Mm,n (C). Then, the elementary row operations are
1. Eij : Interchange the i-th and j-th rows, namely, interchange A[i, :] and A[j, :].
3. Eij (c) for c 6= 0: Replace the i-th row by i-th row plus c-times the j-th row, namely,
replace A[i, :] by A[i, :] + cA[j, :].
1 0 0
AF
E2 (c), c 6= 0; Multiply the 2-th A[2, :] ← cA[2, :] 0 c 0
DR
Ek (c), c 6= 0 row by c 0 0 1
1 0 0
E21 (c), c 6= 0; Replace 2-th row by A[2, :] ← A[2, :] + cA[1, :] c 1 0
Eij (c), c 6= 0 2-nd row plus c-times 0 0 1
1-st row
1 0 0
E23 ; Eij Interchange 2-nd and Interchange A[2, :] and A[3, :]
0 0 1
3-rd rows 0 1 0
1 0 0 1 −5 0 0 0 1
Example 2.2.7. Verify that E2 (5) = 0 5
0, E12 (−5) = 0 1 0 and E13 = 0 1 0
0 0 1 0 0 1 1 0 0
are elementary matrices.
Exercise 2.2.8. 1. Which of the following matrices are elementary?
1
2 0 1 0 0 1 −1 0 1 0 0 0 0 1 0 0 1
2
0 1 0 , 0 1 0 , 0 1 0 , 5 1 0 , 0 1 0 , 1 0 0 .
0 0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0
" #
2 1
2. Find some elementary matrices E1 , . . . , Ek such that Ek · · · E1 = I2 .
1 2
34 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
Example 2.2.9. Let e1 , . . . , en be the standard unit vectors of Mn,1 (R). Then, using eTi ej =
0 = eTj ei and eTi ei = 1 = eTj ej , verify that each elementary matrix is invertible.
Eij Eij = In − ei eTi − ej eTj + ei eTj + ej eTi In − ei eTi − ej eTj + ei eTj + ej eTi = In .
We now show that the above elementary matrices correspond to respective row operations.
(Ek (c)A)[k, :] = eTk (Ek (c)A) = eTk Im + (c − 1)ek eTk A = eTk + (c − 1)eTk (ek eTk ) A
AF
A similar argument with eTi ek = 0, for i 6= k, gives (Ek (c)A)[i, :] = A[i, :], for i 6= k.
2. For c 6= 0, Eij (c)A corresponds to the replacement of A[i, :] by A[i, :] + cA[j, :].
Using eTi ei = 1 and A[i, :] = eTi A, we get
(Eij (c)A)[i, :] = eTi (Eij (c)A) = eTi Im + c ei eTj A = eTi + c eTi (ei eTj ) A
A similar argument with eTk ei = 0, for k 6= i, gives (Eij (c)A)[k, :] = A[k, :], for k 6= i.
3. Eij A corresponds to interchange of A[i, :] and A[j, :].
Using eTi ei = 1, eTi ej = 0 and A[i, :] = eTi A, we get
Similarly, using eTj ej = 1, eTj ei = 0 and A[j, :] = eTj A show that (Eij A)[j, :] = A[i, :].
Further, using eTk ei = 0 = eTk ej , for k 6= i, j show that (Eij A)[k, :] = A[k, :].
Definition 2.2.11. Two matrices A and B are said to be row equivalent if one can be
obtained from the other by a finite number of elementary row operations. Or equivalently,
there exists elementary matrices E1 , . . . , Ek such that B = E1 · · · Ek A.
2.2. ROW-REDUCED ECHELON FORM (RREF) 35
Definition 2.2.12. The linear systems Ax = b and Cx = d are said to be row equivalent if
their respective augmented matrices, [A b] and [C d], are row equivalent.
Thus, note that the linear systems at each step in Example 2.2.4 are row equivalent to each
other. We now prove that the solution set of two row equivalent linear systems are same.
Theorem 2.2.13. Let Ax = b and Cx = d be two row equivalent linear systems. Then they
have the same solution set.
EA = C, E b = d, A = E −1 C and b = E −1 d. (2.2.3)
C y = EA y = E b = d. (2.2.4)
A z = E −1 C z = E −1 d = b. (2.2.5)
Therefore, using Equations (2.2.4) and (2.2.5) the required result follows.
T
Corollary 2.2.14. Let A and B be two row equivalent matrices. Then, the systems Ax = 0
DR
The following exercise shows that every square matrix is row equivalent to an upper trian-
gular matrix. We will come back to this idea again in the chapter titled “Advanced Topics”.
Exercise 2.2.16. 1. Let A = [aij ] ∈ Mn (R). Then there exists an orthogonal matrix U
such that U A is upper triangular. The proof uses the following ideas.
(a) If A[1, :] = 0 then proceed to the next column. Else, A[:, 1] 6= 0.
(b) If A[:, 1] = αe1 , for some α ∈ R, α 6= 0, proceed to the next column. Else, either
a11 = 0 or a11 6= 0.
(c) If a11 = 0 then left multiply A with E1i (an orthogonal matrix) so that the (1, 1)
entry of B = E1i A is non-zero. Hence, without loss of generality, let a11 6= 0.
(d) Let [w1 , . . . , wn ]T = w ∈ Rn with w1 6= 0. Then use the Householder matrix H such
that Hw = w1 e1 , i.e., find x ∈ Rn such that (In − 2xxT )w = w1 e1 .
36 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
" #
w1 ∗
(e) So, Part 1d gives an orthogonal matrix H1 with H1 A = .
0 A1
(f ) Use induction to get H2 ∈ Mn−1 (R) satisfying H2 A1 = T1 , an upper triangular
matrix. " # " #
1 0T w1 ∗
(g) Define H = H1 . Then H is an orthogonal matrix and HA = , an
0 H2 0 T1
upper triangular matrix.
2. Let A ∈ Mn (R) such that tr(A) = 0. Then prove that there exists a non-singular matrix
S such that SAS −1 = B with B = [bij ] and bii = 0, for 1 ≤ i ≤ n.
1. Observe that solving the system Ax = b is quite easy whenever A is a triangular matrix.
DR
Hence, we observe that solving the system Ax = b reduces to solving two easier linear systems,
namely Ly = b and U z = y, where y is obtained as a solution of Ly = b.
To give the LU -decomposition for a square matrix A, we need to know the determinant of A,
namely det(A), and its properties. Since, we haven’t yet studied it, we just give the idea of the
LU -decomposition. For the general case, the readers should see the chapter titled “Advanced
Topics”. Let us start with a few examples.
" #
0 1
Example 2.3.1. 1. Let A = . Then A cannot be decomposed into LU .
1 0
" #" #
a 0 e f
For if, A = LU = then the numbers a, b, c, e, f, g ∈ R satisfy
b c 0 g
ae = 0, af = 1, be = 1 and bf + cg = 0.
6. Finally, using A
= LU , the system
Ax = b reduces to LU x = b. Here,
the solution of
4 4 1
T
Ly = b, for b = 5 equals y = −3 . This, in turn implies x = 2 as the solution of
AF
3 3/2 1
DR
both U x = y or Ax = b.
So, to proceed further, let A ∈ Mn (R). Then, recall that for any S ⊆ {1, 2, . . . , n}, A[S, S]
denotes the principal submatrix of A corresponding to the elements of S (see Page 21). Then,
we assume that det(A[S, S]) 6= 0, for every S = {1, 2, . . . , i}, 1 ≤ i ≤ n.
We need to show that there exists an invertible lower triangular matrix L such that LA is
an invertible upper triangular matrix. The proof uses the following ideas.
" #
a11 A12
1. By assumption A[1, 1] = a11 6= 0. Write A = , where A22 is a (n − 1) × (n − 1).
A21 A22
" #
1 0T −1
2. Let L1 = , where x = A21 . Then L1 is a lower triangular matrix and
x In−1 a11
" #" # " # " #
1 0T a11 A12 a11 A12 a11 A12
L1 A = = = .
x In−1 A21 A22 a11 x + A21 xA12 + A22 0 xA12 + A22
3. Note that (2, 2)-th entry of L1 A equals the (1, 1)-th entry of xA12 + A22 . This equals
a21
−1 . h i a11 a22 − a12 a21 A[{1, 2}, {1, 2}]
. a12 · · · a1n + (A22 )11 = = 6= 0.
a11 .
a11 a11
an1
11
38 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
" #
a11 ∗
4. Thus, L1 is an invertible lower triangular matrix with L1 A = and (A1 )11 6= 0.
0 A1
Hence, det(A) = a11 det(A1 ) and det(A1 [S, S]) 6= 0, for all S ⊆ {1, 2, . . . , n − 1} as
(a) the determinant of a lower triangular matrix equals product of diagonal entries and
(b) if A and B are two n × n matrices then det(AB) = det(A) · det(B).
5. Now, using induction, we get L2 ∈ Mn−1 (R), an invertible lower triangular matrix, with
1’s on the diagonal such that L2 A1 = T1 , an invertible upper triangular matrix.
" # " #
1 0 T a ∗
11
6. Define Le= L1 . Then, verify that LA
e = , is an upper triangular matrix
0 L2 0 T1
with Le as an invertible lower triangular matrix.
−1
7. Defining L = L e , we see that L is a lower triangular matrix (inverse of a lower trian-
" #
a11 ∗
gular matrix is lower triangular) with A = LU and U = , an upper triangular
0 T1
invertible matrix.
We now proceed to understand the row-reduced echelon form (RREF) of a matrix. This un-
AF
derstanding will be used to define the row-rank of a matrix in the next section. In subsequent
DR
1. if C is already in REF,
Even though, we have two pivots in examples 1 and 3, the matrix "I2 doesn’t
# appear as a
3 0
submatrix in pivotal rows and columns. In the first one, we have as a submatrix
0 1
" #
1 1
and in the third the corresponding submatrix is .
0 1
We now give another examples to indicate its application to the theory of the system of
linear equations.
1. There are exactly 3 pivots. These pivots can be in either the columns {1, 2, 3}, {1, 2, 4}
and {1, 3, 4} as we have assumed A[:, 1] 6= 0. The corresponding cases are given below.
1 0 0 d1
(a) Pivots in the columns 1, 2, 3 ⇒ [C d] = 0 1 0 d2 . Here, Ax = b is consistent.
0 0 1 d3
T
x d
1
AF
z d3
1 0 α 0 1 α 0 0
(b) Pivots in the columns 1, 2, 4 or 1, 3, 4 ⇒ [C d] equals
0 1 β 0 or 0 0 1
.
0
0 0 0 1 0 0 0 1
Here, Ax = b is inconsistent for any choice of α, β as there is a row of [0 0 0 1]. This
corresponds to solving 0 · x + 0 · y + 0 · z = 1, an equation which has no solution.
2. There are exactly 2 pivots. These pivots can be in either the columns {1, 2}, {1, 3} or
{1, 4} as we have assumed A[:, 1] 6= 0. The corresponding cases are given below.
1 0 α d1 1 α 0 d1
(a) Pivots in the columns 1, 2 or 1, 3 ⇒ [C d] equals
0 1 β d2
or 0 0 1 d2 .
0 0 0 0 0 0 0 0
Here, for the first matrix, the solution set equals
x d1 − αz d1 −α
y = d2 − βz = d2 + z −β ,
z z d3 1
where z is arbitrary. Here, z is called the “Free variable” as z can be assigned any
value and x and y are called “Basic Variables” and they can be written in terms of
the free variable z and constant.
40 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
1 α β 0
(b) Pivots in the columns 1, 4 ⇒ [C d] = 0 0 0 1 which has a row of [0 0 0 1].
0 0 0 0
This corresponds to solving 0 · x + 0 · y + 0 · z = 1, an equation which has no solution.
1 α β d1
3. There is exactly one pivot. In this case [C d] = 0 0 0 0 . Here, Ax = b is consis-
0 0 0 0
tent and has infinite number of solutions for every choice of α, β as RREF([A b])
has no row of the form [0 0 0 1].
So, having seen the application of the RREF to the augmented matrix, let us proceed with the
algorithm, commonly known as the Gauss-Jordan Elimination (GJE), which helps us compute
the RREF.
1. Input: A ∈ Mm,n (R).
2. Output: a matrix B in RREF such that A is row equivalent to B.
3. Step 1: Put ‘Region’ = A.
4. Step 2: If all entries in the Region are 0, STOP. Else, in the Region, find the leftmost
T
nonzero column and find its topmost nonzero entry. Suppose this nonzero entry is aij = c
AF
5. Step 3: Interchange the row containing the pivot with the top row of the region. Also,
make the pivot entry 1 by dividing this top row by c. Use this pivot to make other entries
in the pivotal column as 0.
6. Step 4: Put Region = the submatrix below and to the right of the current pivot. Now,
go to step 2.
Important: The process will stop, as we can get at most min{m, n} pivots.
0 2 3 7
1 1 1 1
Example 2.4.4. Apply GJE to 1 3 4 8
0 0 0 1
1. Region = A as A 6= 0.
1 1 1 1 1 1 1 1
0 2 3 7 0 2 3 7
1 3 4 8. Also, E31 (−1)E12 A = 0
2. Then, E12 A = = B (say).
2 3 7
0 0 0 1 0 0 0 1
1 1 1 1
2 3 7
1 32 27
6= 0. Then, E2 ( 1 )B = 0
3. Now, Region = 2 3 7 2 0
= C(say). Then,
2 3 7
0 0 1
0 0 0 1
2.4. ROW-REDUCED ECHELON FORM (RREF) 41
−1 −5
1 0 2 2
0 3 7
1 2 2
E12 (−1)E32 (−2)C =
0 = D(say).
0 0
0
0 0 0 1
1 0 −1 2
−5
2
" # 3 7
0 0 0 1 2 2
4. Now, Region = . Then, E34 D = . Now, multiply on the left
0 1 0
0 0 1
0 0 0 0
1 0 − 12 0
0 3
5 −7 1 2 0
by E13 ( 2 ) and E23 ( 2 ) to get
, a matrix in RREF. Thus, A is row
0 0 0 1
0 0 0 0
1 0 − 12 0
0 3
1 2 0
equivalent to F , where F = RREF(A) = .
0 0 0 1
0 0 0 0
5. Note that we have multiplied A on the left by the elementary matrices, E12 , E31 (−1),
E2 (1/2), E32 −2, E12 (−1), E34 , E23 (−7/2), E13 (5/2), i.e.,
The proof of the next result is beyond the scope of this book and hence is omitted.
Theorem 2.4.6. Let A and B be two row equivalent matrices in RREF. Then A = B.
Proof. Suppose there exists a matrix A having B and C as RREFs. As the RREFs are obtained
by left multiplication of elementary matrices, there exist elementary matrices E1 , . . . , Ek and
F1 , . . . , F` such that B = E1 · · · Ek A and C = F1 · · · F` A. Thus,
As inverse of an elementary matrix is an elementary matrix, B and C are are row equivalent.
As B and C are in RREF, using Theorem 2.4.6, B = C.
42 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
But, P B = [P A[:, 1], . . . , P A[:, s]] = [F [:, 1], . . . , F [:, s]]. As F is in RREF, its first s
columns are also in RREF. Thus, by Corollary 2.4.7, RREF(P B) = [F [:, 1], . . . , F [:, s]].
Now, a repeated application of Remark 2.4.8.2 implies RREF(B) = [F [:, 1], . . . , F [:, s]].
Thus, the required result follows.
Proposition 2.4.9. Let A ∈ Mn (R). Then, A is invertible if and only if RREF(A) = In , i.e.,
every invertible matrix is a product of elementary matrices.
T
Recall that if A ∈ Mn (C) is invertible then there exists a matrix B such that AB = In = BA.
So, we want to find a B such that
h i h i h i
e1 · · · en = In = AB = A B[:, 1] · · · B[:, n] = AB[:, 1] · · · AB[:, n] .
h i
So, if B = B[:, 1] · · · B[:, n] is the matrix of unknowns then we need to solve n-system
of linear equations AB[:, 1] = e1 , . . ., AB[:, n] = en . Thus, we have n-augmented matrices
2.5. RANK OF A MATRIX 43
AF
0 0 1 1 0 0 0 0 1 1 0 0
DR
1 0 0 0 −1 1
E12 (−1)
→ 0 1 0 −1 1 .
0
0 0 1 1 0 0
0 −1 1
Thus, A−1 =
−1 1 .
0
1 0 0
1 2 3 1 3 3
Exercise 2.4.12. Use GJE to compute the inverseA =
1 3 2 and B = 2 3 2.
2 4 7 3 5 4
Definition 2.5.1. Let A ∈ Mm,n (C). Then, the rank of A, denoted Rank(A), is the number
of pivots in the RREF(A).
Note that Rank(A) is defined using the number of pivots in RREF (A). These pivots
were obtained using the row operations. The question arises, what if we had applied column
operations? That is, what happens when we multiply by invertible matrices on the right?
44 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
Will the pivots using column operations remain the same or change? This question cannot be
answered at this stage. Using the ideas in vector spaces, we can show that the number of pivots
do not change and hence, we just use the word Rank(A).
We now illustrate the calculation of the rank by giving a few examples.
" #
−6 −12
AB = 0 and BA = . So, Rank(AB) = 0 6= 1 = Rank(BA). Observe that A
3 6
DR
and B are not invertible. So, the rank can either remain the same or reduce.
1 2 1 1 1
7. Let A =
2 3 1 2 2 . Then, Rank(A) = 2 as it’s REF has two pivots.
1 1 0 1 1
Remark 2.5.3. Before proceeding further, for A, B ∈ Mm,n (C), we observe the following.
1. If A and B are row-equivalent then Rank(A) = Rank(B).
2. The number of pivots in the RREF(A) equals the number of pivots in REF of A. Hence,
one needs to compute only the REF to determine the rank.
Corollary 2.5.5. Let A ∈ Mm,n (R) and B ∈ Mn,q (R). Then, Rank(AB) ≤ Rank(A).
In particular, if B ∈ Mn (R) is invertible then Rank(AB) = Rank(A).
Proposition 2.5.6. Let A ∈ Mn (C) be an invertible matrix and let S be any subset of {1, 2, . . . , n}.
Then Rank(A[S, :]) = |S| and Rank(A[:, S]) = |S|.
Proof. Without loss of generality, let S = {1, . . . , r} and S c = {r + 1, . . . , n}. Write A1 = A[:, S]
T
and A2 = A[:, S c ]. Since A is invertible, RREF(A) = In . Hence, by Remark 2.4.8.3, there exists
AF
" #
h i h i Ir 0
P A1 P A2 = P A1 A2 = P A = In = .
0 In−r
" # " #
Ir 0
Thus, P A1 = and P A2 = . So, using Corollary 2.5.5, Rank(A1 ) = r.
0 In−r
For the second part, let B1 = A[S, :], B2 = A[S c , :] and let Rank(B1 ) = t < s. Then, by
Exercise 2, there exists an s × s invertible matrix Q and a matrix C in RREF, of size t × n and
having exactly t pivots, such that
" #
C
QB1 = RREF(B1 ) = . (2.5.2)
0
Theorem 2.5.8. Let A ∈ Mm,n (R). If Rank(A) = r then, there exist invertible matrices P and
Q such that
" #
Ir 0
P AQ= .
T
0 0
AF
Proof. Let C = RREF(A). Then, by Remark 2.4.8.3 there exists as invertible matrix P such
DR
that C = P A. Note that C has r pivots and they appear in columns, say i1 < i2 < · · · < ir .
Now, let Q1 = E1i1 E2i2 · · · Erir . As Ejij ’s are elementary matrices that interchange the
" #
Ir B
columns of C, one has D = CQ1 = , where B ∈ Mr,n−r (R).
0 0
" #
Ir −B
Now, let Q2 = and Q = Q1 Q2 . Then Q is invertible and
0 In−r
" #" # " #
Ir B Ir −B Ir 0
P AQ = CQ = CQ1 Q2 = DQ2 = = .
0 0 0 In−r 0 0
Corollary 2.5.9. Let A ∈ Mm,n (R). If Rank(A) = r then there exist matrices B ∈ Mm,r (R)
r
and C ∈ Mr,n (R) such that Rank(B) = Rank(C) = r and A = BC. Furthermore, A = xi yiT ,
P
i=1
for some xi ∈ Rm and yi ∈ Rn .
2.5. RANK OF A MATRIX 47
" #
Ir 0
Proof. By Theorem 2.5.8, there exist invertible matrices P and Q such that P A Q = .
0 0
" # " #
I r 0 C
Or equivalently, A = P −1 Q−1 . Decompose P −1 = [B D] and Q−1 = such that
0 0 F
B ∈ Mm,r (R) and C ∈ Mr,n (R). Then Rank(B) = Rank(C) = r (see Proposition 2.5.6) and
" #" # " #
Ir 0 C h i C
A = [B D] = B 0 = BC.
0 0 F F
y1T
h i . r
.. . Then A = BC = xi yiT .
P
Furthermore, assume that B = x1 · · · xr and C =
i=1
yrT
Proposition 2.5.10. Let A, B ∈ Mm,n (R). Then, prove that Rank(A + B) ≤ Rank(A) +
k
xi yiT , for some xi , yi ∈ R, for 1 ≤ i ≤ k, then Rank(A) ≤ k.
P
Rank(B). In particular, if A =
i=1
Proof. Let Rank(A) = r. Then, # exists an invertible matrix P and a matrix A1 ∈ Mr,n (R)
" there
A1
such that P A = RREF(A) = . Then,
0
" # " # " #
T
A1 B1 A1 + B 1
P (A + B) = P A + P B = + = .
AF
0 B2 B2
DR
Thus, the required result follows. The other part follows, as Rank(xi yiT ) = 1, for 1 ≤ i ≤ k.
Exercise 2.5.11. 1. Let A ∈ Mm,n (R) be a matrix of rank 1. Then prove that A = xyT ,
for non-zero vectors x ∈ Rm and y ∈ Rn .
2. Let A ∈ Mm (R). If Rank(A) = 1 then prove that A2 = αA, for some scalar α.
" # " #
2 4 8 1 0 0
3. Let A = and B = .
1 3 2 0 1 0
h i
(a) Find P and Q such that B = P AQ. Thus, A = P −1 I2 0 Q−1 .
" #
I2
(b) Define G = Q T P . Then, verify that AGA = A. Hence, G is a g-inverse of A.
x
" #
I2
(c) In particular, if b = 0 then G = Q T P . In this case, verify that GAG = G,
0
T T
(AG) = AG and (GA) = GA. Hence, this G is the pseudo-inverse of A.
" #
1 2 3
4. Let A = .
2 1 1
48 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
a α
(a) Find a matrix G such that AG = I2 . Hint: Let G =
b β . Now, use AG = I2 to
c γ
get the solution space and proceed.
(b) What can you say about the number of such matrices? Give reasons for your answer.
(c) Does the choice of G in part (a) also satisfies (AG)T = AG and (GA)T = GA? Give
reasons for your answer.
(d) Does there exist a matrix C such that CA = I3 ? Give reasons for your answer.
(e) Could you have used the ideas from Exercise 2.5.11.3 to get your answers?
In the first case, there is a pivot in the (n + 1)-th column of the augmented matrix [A b].
Thus, the column corresponding to b has a pivot. This implies b 6= 0. This implies that the
row corresponding to this pivot in RREF([A b]) has all entries before this pivot as 0. Thus,
in RREF([A b]) this pivotal row equals [0 0 · · · 0 1]. But, this corresponds to the equation
0 · x1 + 0 · x2 + · · · + 0 · xn = 1. This implies that the Ax = b has no solution whenever
Definition 2.6.1. Consider the linear system Ax = b. If RREF([A b]) = [C d]. Then,
the variables corresponding to the pivotal columns of C are called the basic variables and the
variables that correspond to non-pivotal columns are called free variables.
Then to get the solution set, observe that C has 4 pivotal columns, namely, the columns 1, 2, 5
and 6. Thus, x1 , x2 , x5 and x6 are basic variables. Therefore, the remaining variables x3 , x4 and
x7 are free variables. Hence, the solution set is given by
x1 −2x3 + x4 − 2x7 −2 1 −2
x2 −x3 − 3x4 − 5x7 −1 −3 −5
x3 x3 1 0 0
x4 = x 4
= x 3
0 + x 4
1 + x 7
0 ,
T
AF
x7 x7 0 0 1
Theorem 2.6.3. Let Ax = b be a linear system in n variables with RREF([A b]) = [C d].
Proof. Part 1: As Rank([A b]) > Rank(A), by Remark 2.4.8.4 ([C d])[r + 1, :] = [0T 1]. Note
that this row corresponds to the linear equation
0 · x1 + 0 · x2 + · · · + 0 · xn = 1
n−r n−r
AF
X X
x i` + c`tk xtk = d` ⇔ xi` = d` − c`tk xtk .
k=1 k=1
DR
that Ax0 = b and, for 1 ≤ i ≤ n − r, Aui = 0. Also, by Equation (2.6.3) the solution set has
indeed the required form, where ki corresponds to the free variable xti . As there is at least one
free variable the system has infinite number of solutions.
Thus, note that the solution set of Ax = b depends on the rank of the coefficient matrix, the
rank of the augmented matrix and the number of unknowns. In some sense, it is independent
of the choice of m.
Exercise 2.6.4. Consider the linear system given below. Use GJE to find the RREF of its
augmented matrix and use it to find the solution.
x +y −2 u +v = 2
z +u +2 v = 3
v +w = 3
v +2 w = 5
Let A ∈ Mm,n (R). Then, Rank(A) ≤ m. Thus, using Theorem 2.6.3 the next result follows.
Corollary 2.6.5. Let A ∈ Mm,n (R). If Rank(A) = r < n then the homogeneous system Ax = 0
has at least one non-trivial solution.
Remark 2.6.6. Let A ∈ Mm,n (R). Then, Theorem 2.6.3 implies that Ax = b is consistent
if and only if Rank(A) = Rank([A b]). Further, the the vectors ui ’s associated with the free
T
Example 2.6.7. 1. Determine the equation of the circle passing through the points (−1, 4), (0, 1)
and (1, 4).
Solution: The equation a(x2 + y 2 ) + bx + cy + d = 0, for a, b, c, d ∈ R, represents a circle.
Since this curve passes through the given points, we get a homogeneous system having 3
equations in4 unknowns, namely
a
(−1)2 + 42 −1 4 1
b
(0)2 + 12 0 1 1 = 0.
c
12 + 4 2 1 4 1
d
3
Solving this system, we get [a, b, c, d] = [ 13 d, 0, − 16
13 d, d]. Hence, choosing d = 13, the
required circle is given by 3(x2 + y 2 ) − 16y + 13 = 0.
2. Determine the equation of the plane that contains the points (1, 1, 1), (1, 3, 2) and (2, −1, 2).
Solution: The general equation of a plane in space is given by ax + by + cz + d = 0,
where a, b, c and d are unknowns. Since this plane passes through the 3 given points, we
get a homogeneous system in 3 equations and 4 variables. So, it has a non-trivial solution,
namely [a, b, c, d] = [− 43 d, − d3 , − 32 d, d]. Hence, choosing d = 3, the required plane is given
by −4x − y + 2z + 3 = 0.
52 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
2 3 4
3. Let A = 0 −1 0 . Then, find a non-trivial solution of Ax = 2x. Does there exist a
0 −3 4
nonzero vector y ∈ R3 such that Ay = 4y?
Solution: Solving for Ax = 2x is equivalentto solving (A − 2I)x = 0. The augmented
0 3 4 0
. Verify that xT = [1, 0, 0] is a nonzero
matrix of this system equals
0 −3 0 0
0 −3 2 0
solution.
other part, the augmented matrix for solving (A − 4I)y = 0 equals
For the
−2 3 4 0
0 −5 0 0. Thus, verify that yT = [2, 0, 1] is a nonzero solution.
0 −3 0 0
Exercise 2.6.8. 1. Let A ∈ Mn (R). If A2 x = 0 has a non trivial solution then show that
Ax = 0 also has a non trivial solution.
2. Let u = (1, 1, −2)T and v = (−1, 2, 3)T . Find condition on x, y and z such that the system
cu + dv = (x, y, z)T in the unknowns c and d is consistent.
3. Find condition(s) on x, y, z so that the systems given below (in the unknowns a, b and c)
is consistent?
T
AF
(a) a + 2b − 3c = x, 2a + 6b − 11c = y, a − 2b + 7c = z.
DR
(b) a + b + 5c = x, a + 3c = y, 2a − b + 4c = z.
4. For what values of c and k, the following systems have i) no solution, ii) a unique
solution and iii) infinite number of solutions.
(a) x + y + z = 3, x + 2y + cz = 4, 2x + 3y + 2cz = k.
(b) x + y + z = 3, x + y + 2cz = 7, x + 2y + 3cz = k.
(c) x + y + 2z = 3, x + 2y + cz = 5, x + 2y + 4z = k.
(d) x + 2y + 3z = 4, 2x + 5y + 5z = 6, 2x + (c2 − 6)z = c + 20.
(e) x + y + z = 3, 2x + 5y + 4z = c, 3x + (c2 − 8)z = 12.
5. Consider the linear system Ax = b in m equations and 3 unknowns. Then, for each of
the given solution set, determine the possible choices of m? Further, for each choice of
m, determine a choice of A and b.
Theorem 2.7.1. Let A ∈ Mn (R). Then, the following statements are equivalent.
1. A is invertible.
2. RREF(A) = In .
4. Rank(A) = n.
1 =⇒ 5
AF
system Ax = 0 then
DR
5 =⇒ 1 Ax = 0 has only the trivial solution implies that there are no free variables. So,
all the unknowns are basic variables. So, each column is a pivotal column. Thus, RREF(A) = In .
1⇒6 Note that x0 = A−1 b is the unique solution of Ax = b.
6⇒7 A unique solution implies that is at least one solution. So, nothing to show.
7 =⇒ 1 Given assumption implies that for 1 ≤ i ≤ n, the linear system Ax = ei has a
solution, say ui . Define B = [u1 , u2 , . . . , un ]. Then
x0 = In x0 = (AB)x0 = A(Bx0 ) = A0 = 0.
Thus, the homogeneous system Bx = 0 has a only the trivial solution. Hence, using Part 5, B
is invertible. As AB = In and B is invertible, we get BA = In . Thus AB = In = BA. Thus, A
is invertible as well.
We now give an immediate application of Theorem 2.7.1 without proof.
Theorem 2.7.2. The following two statements cannot hold together for A ∈ Mn (R).
As an immediate consequence of Theorem 2.7.1, the readers should prove that one needs to
compute either the left or the right inverse to prove invertibility of A ∈ Mn (R).
Corollary 2.7.4. (Theorem of the Alternative) The following two statements cannot hold
together for A ∈ Mn (C) and b ∈ Rn .
Note that one of the requirement in the last corollary is yT b 6= 0. Thus, we want non-zero
vectors x0 and y0 in Rn such that they are solutions of Ax = b and yT A = 0T , respectively,
T
with the added condition that y0 and b are not orthogonal or perpendicular (their dot product
AF
is not zero).
DR
Exercise 2.7.5. 1. Give the proof of Theorem 2.7.2 and Corollary 2.7.3.
2. Let A ∈ Mn,m (R) and B ∈ Mm,n (R). Either use Theorem 2.7.1.5 or multiply the matrices
to verify the following statementes.
3. Let bT = [1, 2, −1, −2]. Suppose A is a 4 × 4 matrix such that the linear system Ax = b
has no solution. Mark each of the statements given below as true or false?
2.8 Determinant
1 2 3 " #
1 2
Recall the notations used in Section 1.5 on Page 21 . If A = 1 3 2 then A(1 | 2) = 2 7
2 4 7
and A({1, 2} | {1, 3}) = [4]. The actual definition of the determinant requires an understanding
of group theory. So, we will just give an inductive definition which will help us to compute
the determinant and a few results. The advanced students can find the main definition of the
determinant in Appendix 9.2.22, where it is proved that the definition given below corresponds
to the expansion of determinant along the first row.
Definition 2.8.1. Let A be a square matrix of order n. Then, the determinant of A, denoted
det(A) (or | A | ) is defined by
a,
if A = [a] (corresponds to n = 1),
det(A) = n
(−1)1+j a1j det A(1 | j) ,
P
otherwise.
j=1
" #
2 1
DR
det(A) = | A | = a11 det(A(1 | 1)) − a12 det(A(1 | 2)) + a13 det(A(1 | 3))
a
22 a23
a
21 a23
a
21 a22
= a11 − a12 + a 13
a32 a33 a31 a33 a31 a32
= a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a31 a23 ) + a13 (a21 a32 − a31 a22 ).
1 2 3
3 1
2 1
2 3
2 3 1, det(A) = 1 · 2 2 − 2 · 1 2 + 3 · 1 2 = 4 − 2(3) + 3(1) = 1.
For A =
1 2 2
Exercise
2.8.3.
Find the determinant
of the following matrices.
1 2 7 8 3 0 0 1
1 a a2
0 4 3 2 0 2 0 5
i)
0 0 2 3 ii) 6 −7 1 0 iii) 1
b b2
.
1 c c2
0 0 0 5 3 2 0 6
56 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
It turns out that the determinant of a matrix equals the volume of the parallelepiped formed
using the columns of the matrix. With this understanding, the singularity of A gets related with
the dimension in which we are looking at the parallelepiped. For, example, the length makes
sense in one-dimension but it doesn’t make sense to talk of area (which is a two-dimensional
idea) of a line segment. Similarly, it makes sense to talk of volume of a cube but it doesn’t make
sense to talk of the volume of a square or rectangle or parallelogram which are two-dimensional
objects.
We now state a few properties of the determinant function. For proof, see Appendix 9.3.
1. det(In ) = 1.
Thus, using Theorem 2.8.5, det(A) = 2 · (1 · 2 · (−1)) = −4, where the first 2 appears from the
1
elementary matrix E1 ( ).
2
Exercise 2.8.7. Prove the following without computing the determinant (use Theorem 2.8.5).
h i
1. Let A = u v 2u + 3v , where u, v ∈ R3 . Then, det(A) = 0.
a b c a e x2 a + xe + h
2
2. Let A = e f g . If x 6= 0 and B = b f x b + xf + j then det(A) = det(B).
h j ` c g x2 c + xg + `
3 1 2
Hence, conclude that 3 divides 4 7 1 .
1 4 −2
2.8. DETERMINANT 57
Remark 2.8.8. Theorem 2.8.5.3 implies that the determinant can be calculated by expanding
along any row. Hence, the readers are advised to verify that
n
X
det(A) = (−1)k+j akj det(A(k | j)), for 1 ≤ k ≤ n.
j=1
Example
Using Remark 2.8.8, one has
2.8.9.
2 2 6 1
2 2 1 2 2 6
0 0 2 1 2+3
2+4
0 1 = (−1) · 2 · 0 1 0 + (−1)
· 0 1 2 = −2 · 1 + (−8) = −10.
2 0
1 2 1 1 2 1
1 2 1 1
Definition 2.8.10. Let A ∈ Mn (R). Then, the cofactor matrix, denoted Cof(A), is an Mn (R)
matrix with Cof(A) = [Cij ], where
And, the Adjugate (classical Adjoint) of A, denoted Adj(A), equals CofT (A).
DR
1 2 3
Example 2.8.11. Let A =
2 3 1. Then,
1 2 4
C C21 C31
T
11
Adj(A) = Cof (A) =
C12 C22 C32
C13 C23 C33
(−1)1+1 det(A(1|1)) (−1)2+1 det(A(2|1)) (−1)3+1 det(A(3|1))
= 1+2 det(A(1|2)) (−1)2+2 det(A(2|2)) (−1)3+2 det(A(3|2))
(−1)
(−1)1+3 det(A(1|3)) (−1)2+3 det(A(2|3)) (−1)3+3 det(A(3|3))
10 −2 −7
=
−7 1 5 .
1 0 −1
−1 0 0 det(A) 0 0
Now, verify that AAdj(A) =
0 −1 0 = 0
det(A) 0 = Adj(A)A.
0 0 −1 0 0 det(A)
58 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
x − 1 −2 −3
Consider xI3 − A =
−2 x − 3 −1 . Then,
−1 −2 x − 4
C C21 C31 x2 − 7x + 10 2x − 2 3x − 7
11
Adj(xI − A) = C12 C22
C32 = 2x − 7 x 2 − 5x + 1 x + 5
C13 C23 C33 x+1 2x x2 − 4x − 1
−7 2 3
2 2
= x I + x 2
−5 1 + Adj(A) = x I + Bx + C(say).
1 2 −4
That is, we have obtained a matrix identity. Hence, replacing x by A makes sense. But, then
the LHS is 0. So, for the RHS to be zero, we must have A3 − 8A2 + 10A − det(A)I = 0 (this
equality is famously known as the Cayley-Hamilton Theorem).
The next result relates adjugate matrix with the inverse, in case det(A) 6= 0.
T
AF
n n
aij (−1)i+j det(A(`|j)) = 0, for i 6= `.
P P
2. Then aij C`j =
j=1 j=1
Proof. Part 1: It follows directly from Remark 2.8.8 and the definition of the cofactor.
Part 2: Fix positive integers i, ` with 1 ≤ i 6= ` ≤ n. Suppose that the i-th and `-th rows of
B are equal to the i-th row of A and B[t, :] = A[t, :], for t 6= i, `. Since two rows of B are equal,
det(B) = 0. Now, let us expand the determinant of B along the `-th row. We see that
n
X
(−1)`+j b`j det B(` | j)
0 = det(B) = (2.8.2)
j=1
n
X
(−1)`+j aij det B(` | j)
= (bij = b`j = aij for all j)
j=1
Xn n
X
= (−1)`+j aij det A(` | j) = aij C`j . (2.8.3)
j=1 j=1
i=1
AF
The next result gives another equivalent condition for a square matrix to be invertible.
DR
1
Proof. Let A be non-singular. Then, det(A) 6= 0 and hence A−1 = det(A) Adj(A).
Now, let us assume that A is invertible. Then, using Theorem 2.7.1, A = E1 · · · Ek , a product
of elementary matrices. Thus, a repeated application of Parts 3, 4 and 5 of Theorem 2.8.5 gives
det(A) 6= 0.
The next result relates the determinant of a matrix with the determinant of its transpose. Thus,
the determinant can be computed by expanding along any column as well.
Theorem 2.8.16. Let A ∈ Mn (R). Then det(A) = det(AT ). Further, det(A∗ ) = det(A).
Proof. If A is singular then, by Theorem 2.8.15, A is not invertible. So, AT is also not invertible
and hence by Theorem 2.8.15, det(AT ) = 0 = det(A).
Now, let A be a non-singular and let AT = B. Then, by definition,
n
X n
X
det(AT ) = det(B) = (−1)1+j b1j det B(1 | j) = (−1)1+j aj1 det A(j | 1)
j=1 j=1
n
X
= aj1 Cj1 = det(A)
j=1
60 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
using Corollary 2.8.14. Further, using induction and the first part, one has
n
X
det(A∗ ) = det((A)T ) = det(A) = (−1)1+j a1j det A(1 | j)
j=1
n
X
= (−1)1+j a1j det A(1 | j) = det(A)
j=1
Case 2: Let A be singular. Then, by Theorem 2.8.15 A is"not#invertible. So, "by Proposi-
#
C1 C1
DR
2. Let A and B be two matrices having positive entries and of orders 1 × n and n × 1,
respectively. Which of BA or AB is invertible? Give reasons.
Consider the linear system Ax = b. Then, using Theorems 2.7.1 and 2.8.15, we conclude
that Ax = b has a unique solution for every b if and only if det(A) 6= 0. The next theorem,
commonly known as the Cramer’s rule gives a direct method of finding the solution of the
linear system Ax = b when det(A) 6= 0.
T
AF
Theorem 2.8.20. Let A be an n × n non-singular matrix. Then, the unique solution of the
DR
det(Aj )
xj = , for j = 1, 2, . . . , n,
det(A)
where Aj is the matrix obtained from A by replacing the j-th column of A, namely A[:, j], by b.
Proof. Since det(A) 6= 0, A is invertible. Thus A−1 [A | b] = [I | A−1 b]. Let d = A−1 b. Then
Ax = b has the unique solution xj = dj , for 1 ≤ j ≤ n. Thus,
−1 −1
A Aj = A A[:, 1], . . . , A[:, j − 1], b, A[:, j + 1], . . . , A[:, n]
−1 −1 −1 −1 −1
= A A[:, 1], . . . , A A[:, j − 1], A b, A A[:, j + 1], . . . , A A[:, n]
det(Aj )
Hence, xj = and the required result follows.
det(A)
62 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
1 2 3 1
Example 2.8.21. Solve Ax = b using Cramer’s rule, where A = 2 3 1 and b = 1
.
1 2 2 1
T
Solution: Check that det(A) = 1 and x = [−1, 1, 0] as
1 2 3 1 1 3 1 2 1
x1 = 1 3 1 = −1, x2 = 2 1 1 = 1, and x3 = 2 3 1 = 0.
1 2 2 1 1 2 1 2 1
2. Let A be a unitary matrix then what can you say about | det(A) |?
10. Determine necessary and sufficient condition for a triangular matrix to be invertible.
11. Let A and B be two non-singular matrices. Are the matrices A + B and A − B non-
singular? Justify your answer.
12. For what value(s) of λ does the following systems have non-trivial solutions? Also, for
each value of λ, determine a non-trivial solution.
2.10. SUMMARY 63
14. Let A = [aij ] ∈ Mn (R) with aij = max{i, j}. Prove that det A = (−1)n−1 n.
15. Let p ∈ R, p 6= 0. Let A = [aij ], B = [bij ] ∈ Mn (R) with bij = pi−j aij , for 1 ≤ i, j ≤ n.
Then, compute det(B) in terms of det(A).
16. The position of an element aij of a determinant is called even or odd according as i + j is
even or odd. Prove that if all the entries in
(a) odd positions are multiplied with −1 then the value of determinant doesn’t change.
(b) even positions are multiplied with −1 then the value of determinant
i. does not change if the matrix is of even order.
ii. is multiplied by −1 if the matrix is of odd order.
2.10 Summary
T
AF
In this chapter, we started with a system of m linear equations in n variables and formally
DR
wrote it as Ax = b and in turn to the augmented matrix [A | b]. Then, the basic operations on
equations led to multiplication by elementary matrices on the right of [A | b]. These elementary
matrices are invertible and applying the GJE on a matrix A, resulted in getting the RREF of
A. We used the pivots in RREF matrix to define the rank of a matrix. So, if Rank(A) = r and
Rank([A | b]) = ra
We have also seen that the following conditions are equivalent for A ∈ Mn (R).
1. A is invertible.
7. Rank(A) = n.
8. det(A) 6= 0.
1. Solving the linear system Ax = b. This idea will lead to the question “is the vector b a
linear combination of the columns of A”?
2. Solving the linear system Ax = 0. This will lead to the question “are the columns of A
linearly independent/dependent”? In particular, we will see that
(a) if Ax = 0 has a unique solution then the columns of A are linear independent.
(b) if Ax = 0 has a non-trivial solution then the columns of A are linearly dependent.
T
AF
DR
Chapter 3
Vector Spaces
In this chapter, we will mainly be concerned with finite dimensional vector spaces over R or C.
Please note that the real and complex numbers have the property that any pair of elements can
be added, subtracted or multiplied. Also, division is allowed by a non-zero element. Such sets in
mathematics are called field. So, Q, R and C are examples of field and they have infinite number
of elements. But, in mathematics, we do have fields that have only finitely many elements. For
example, consider the set Z5 = {0, 1, 2, 3, 4}. In Z5 , we define addition and multiplication,
respectively, as
+ 0 1 2 3 4 · 0 1 2 3 4
T
AF
0 0 1 2 3 4 0 0 0 0 0 0
1 1 2 3 4 0 1 0 1 2 3 4
DR
and .
2 2 3 4 0 1 2 0 2 4 1 3
3 3 4 0 1 2 3 0 3 1 4 2
4 4 0 1 2 3 4 0 4 3 2 1
Then, we see that the elements of Z5 can be added, subtracted and multiplied. Note that 4
behaves as −1 and 3 behaves as −2. Thus, 1 behaves as −4 and 2 behaves as −3. Also, we see
that in this multiplication 2 · 3 = 1 and 4 · 4 = 1. Hence,
1. the division by 2 is similar to multiplying by 3,
2. the division by 3 is similar to multiplying by 2, and
3. the division by 4 is similar to multiplying by 4.
Thus, Z5 indeed behaves like a field. So, in this chapter, F will represent a field.
65
66 CHAPTER 3. VECTOR SPACES
(a) α · (u + v) = (α · u) + (α · v).
(b) (α + β) · u = (α · u) + (β · u).
So, we want the above properties to hold for any collection of vectors. Thus, formally, we have
the following definition.
Definition 3.1.1. A vector space V over F, denoted V(F) or in short V (if the field F is clear
from the context), is a non-empty set, in which one can define vector addition, scalar multipli-
T
AF
cation. Further, with these definitions, the properties of vector addition, scalar multiplication
and distributive laws (see items 1, 2 and 3 above) are satisfied.
DR
w1 = w1 + 0 = w1 + (u + w2 ) = (w1 + u) + w2 = 0 + w2 = w2 .
Hence, we represent this unique vector by −u and call it the additive inverse.
5. If V is a vector space over R then V is called a real vector space.
6. If V is a vector space over C then V is called a complex vector space.
7. In general, a vector space over R or C is called a linear space.
Some interesting consequences of Definition 3.1.1 is stated next. Intuitively, they seem
obvious. The proof are given for better understanding of the given conditions.
1. u + v = u implies v = 0.
3.1. VECTOR SPACES: DEFINITION AND EXAMPLES 67
Proof. Part 1: By Condition 1d and Remark 3.1.2.4, for each u ∈ V there exists −u ∈ V such
that −u + u = 0. Hence u + v = u implies
0 = −u + u = −u + (u + v) = (−u + u) + v = 0 + v = v.
Example 3.1.4. The readers are advised to justify the statements given below.
2. Let A ∈ Mm,n (F) and define V = {x ∈ Mn,1 (F) : Ax = 0}. Then, by Theorem 2.1.7, V
T
AF
satisfies:
DR
(a) 0 ∈ V as A0 = 0.
(b) if x ∈ V then αx ∈ V, for all α ∈ F. In particular, for α = −1, −x ∈ V.
(c) if x, y ∈ V then, for any α, β ∈ F, αx + βy ∈ V.
3. Consider R with the usual addition and multiplication. Then R forms a real vector space.
√
Recall that the symbol i represents the complex number −1.
6. Fix m, n ∈ N and let Mm,n (C) = {Am×n = [aij ] | aij ∈ C}. Then, with usual addition
and scalar multiplication of matrices, Mm,n (C) is a complex vector space. If m = n, the
vector space Mm,n (C) is denoted by Mn (C).
8. Fix a, b ∈ R with a < b and let C([a, b], R) = {f : [a, b] → R | f is continuous}. Then,
C([a, b], R) with (f + αg)(x) = f (x) + αg(x), for all x ∈ [a, b], is a real vector space.
10. Fix a < b ∈ R and let C 2 ((a, b), R) = {f : (a, b) → R | f 00 is continuous}. Then,
C 2 ((a, b), R) with (f + αg)(x) = f (x) + αg(x), for all x ∈ (a, b), is a real vector space.
11. Let R[x] = {a0 + a1 x + · · · + an xn | ai ∈ R, for 0 ≤ i ≤ n}. Now, let p(x), q(x) ∈ R[x].
T
AF
and αp(x) = (αa0 ) + (αa1 )x + · · · + (αam )xm , for α ∈ R. With these operations
“component-wise addition and multiplication”, it can be easily verified that R[x] forms a
real vector space.
12. Fix n ∈ N and let R[x; n] = {p(x) ∈ R[x] | p(x) has degree ≤ n}. Then, with component-
wise addition and multiplication, the set R[x; n] forms a real vector space.
13. Let V and W be vector spaces over F, with operations (+, •) and (⊕, ), respectively. Let
V × W = {(v, w) | v ∈ V, w ∈ W}. Then, V × W forms a vector space over F, if for every
(v1 , w1 ), (v2 , w2 ) ∈ V × W and α ∈ R, we define
v1 +v2 and w1 ⊕w2 on the right hand side mean vector addition in V and W, respectively.
Similarly, α • v1 and α w1 correspond to scalar multiplication in V and W, respectively.
Note that R2 is similar to R × R, where the operations are the same in both spaces.
(a) R is a vector space over Q. In this space, all the irrational numbers are vectors but
not scalars.
√
(b) V = {a + b 2 : a, b ∈ Q} is a vector space.
√ √ √
(c) V = {a + b 2 + c 3 + d 6 : a, b, c, d ∈ Q} is a vector space.
√
(d) V = {a + b −3 : a, b ∈ Q} is a vector space.
Then, R2 is a real vector space with (−1, 3)T as the additive identity.
17. Recall the field Z5 = {0, 1, 2, 3, 4} given on the first page of this chapter. Then, V =
{(a, b) | a, b ∈ Z5 } is a vector space over Z5 having 25 elements/vectors.
From now on, we will use ‘u + v’ for ‘u ⊕ v’ and ‘αu or α · u’ for ‘α u’.
T
AF
Exercise 3.1.6. 1. Verify that the vectors spaces mentioned in Example 3.1.4 do satisfy all
DR
Then, does V form a vector space under any of the two operations?
Definition 3.1.7. Let V be a vector space over F. Then, a non-empty subset W of V is called
a subspace of V if W is also a vector space with vector addition and scalar multiplication in
W coming from that in V (compute the vector addition and scalar multiplication in V and then
the computed vector should be an element of W).
Example 3.1.8.
3. Let V be a vector space. Then V and {0} are subspaces, called trivial subspaces.
70 CHAPTER 3. VECTOR SPACES
4. The real vector space R has no non-trivial subspace. To check this, let V 6= {0} be a
vector subspace of R. Then, there exists x ∈ R, x 6= 0 such that x ∈ V. Now, using scalar
multiplication, we see that {αx | α ∈ R} ⊆ V. As, x 6= 0, the set {αx | α ∈ R} = R. This
in turn implies that V = R.
8. Is the set of sequences converging to 0 a subspace of the set of all bounded sequences?
Let V(F) be a vector space and W ⊆ V, W 6= ∅. We now prove a result which implies that
to check W to be a subspace, we need to verify only one condition.
T
4. The commutative and associative laws of vector addition hold as they hold in V.
5. The conditions related with scalar multiplication and the distributive laws also hold as
they hold in V.
Exercise 3.1.10. 1. Prove that a line in R2 is a subspace if and only if it passes through
origin.
3. Does the set V given below form a subspace? Give reasons for your answer.
(f ) W = {A ∈ Mn (R) | AT = 2A}?
9. Among the following, determine the subspaces of the complex vector space Cn ?
(b) {(z1 , z2 , . . . , zn )T | z1 + z2 = z3 }.
Let us recollect that system Ax = b was either consistent (has a solution) or inconsistent (no
solution). It turns out that the system Ax = b is consistent leads to the idea that the vector b
is a linear combination of the columns of A. Let us try to understand them using examples.
Example 3.2.1.
72 CHAPTER 3. VECTOR SPACES
1 1 2 2 1 1 2
1. Let A = 1 2 and b = 3. Then, 3 = 1 + 2
. Thus,
3 is a linear
1 3 4 4 1 3 4
1 1 10
combination of the vectors in S = 1, 2 . Similarly, the vector
16 is a linear
1
3
22
10 1 1 " #
4
combination of the vectors in S as
16 = 41 + 62 = A 6 .
22 1 3
2 1 1 2
2. Let b =
3. Then, the system Ax = b has no solution as REF ([A b]) = 0 1 1.
5 0 0 1
Definition 3.2.2. Let V be a vector space over F and let S = {u1 , . . . , un } ⊆ V. Then, a
vector u ∈ V is called a linear combination of elements of S if we can find α1 , . . . , αn ∈ F
such that
n
X
u = α 1 u1 + · · · + α n un = α i ui .
T
i=1
AF
n
αi ui , where α1 , . . . , αn ∈ F, is said to be a linear
P
Or equivalently, any vector of the form
DR
i=1
combination of the elements of S.
Example 3.2.3.
1. (3, 4, 5) is not a linear combination of (1, 1, 1) and (1, 2, 1) as the linear system (3, 4, 5) =
a(1, 1, 1) + b(1, 2, 1), in the unknowns a and b has no solution.
Exercise 3.2.4. 1. Let x ∈ R3 . Prove that xT is a linear combination of (1, 0, 0), (2, 1, 0)
and (3, 3, 1).
3.2. LINEAR COMBINATION AND LINEAR SPAN 73
Let V be a vector space over F and S a subset of V. We now look at ‘linear span’ of a collection
of vectors. So, here we ask “what is the largest collection of vectors that can be obtained as
linear combination of vectors from S”? Or equivalently, what is the smallest subspace of V that
contains S? We first look at an example for clarity.
Example 3.2.5. Let S = {(1, 0, 0), (1, 2, 0)} ⊆ R3 . We want the largest possible subspace
of R3 which contains vectors of the form α(1, 0, 0), β(1, 2, 0) and α(1, 0, 0) + β(1, 2, 0) for all
possible choices of α, β ∈ R. Note that
1. `1 = {α(1, 0, 0) : α ∈ R} gives the X-axis.
2. `2 = {β(1, 2, 0) : β ∈ R} gives the line passing through (0, 0, 0) and (1, 2, 0).
So, we want the largest subspace of R3 that contains vectors which are formed as sum of
any two points on the two lines `1 and `2 . Or the smallest subspace of R3 that contains S? We
T
That is, LS(S) is the set of all possible linear combinations of finitely many vectors of S.
If S is an empty set, we define LS(S) = {0}.
2. V is said to be finite dimensional if there exists a finite set S such that V = LS(S).
3. If there does not exist any finite subset S of V such that V = LS(S) then V is called
infinite dimensional.
3. S = {1 + 2x + 3x2 , 1 + x + 2x2 , 1 + 2x + x3 }.
Solution: To understand LS(S), we need to find condition(s) on α, β, γ, δ such that the
linear system
Note that, for every fixed n ∈ N, R[x; n] is finite dimensional as R[x; n] = LS ({1, x, . . . , xn }).
T
AF
0 1 1 0 1 2
DR
4. S = I3 , 1 1 2, 1 0 2 ⊆ M3 (R).
1 2 0 2 2 4
Solution: To get the equation, we need to find conditions on aij ’s such that the system
α β+γ β + 2γ a11 a12 a13
= a21 a22 a23 ,
β+γ α + β 2β + 2γ
β + 2γ 2β + 2γ α + 2γ a31 a32 a33
in the unknowns α, β, γ is always consistent. Now, verify that the required condition
equals
a22 + a33 − a13
LS(S) = {A = [aij ] ∈ M3 (R) | A = AT , a11 = ,
2
a22 − a33 + 3a13 a22 − a33 + 3a13
a12 = , a23 = .
4 2
In general, for each fixed m, n ∈ N, the vector space Mm,n (R) is finite dimensional as
Mm,n (R) = LS ({eij : 1 ≤ i ≤ m, 1 ≤ j ≤ n}).
5. C[x] is not finite dimensional as the degree of a polynomial can be any large positive
integer. Indeed, verify that C[x] = LS({1, x, x2 , . . . , xn , . . .}).
Exercise 3.2.8. Determine the equation of the geometrical object represented by LS(S).
3.2. LINEAR COMBINATION AND LINEAR SPAN 75
1. S = {π} ⊆ R.
4. S = {(1, 0, 1)T , (0, 1, 0)T , (2, 0, 2)T } ⊆ R3 . Give two examples of vectors u, v different
from the given set such that LS(S) = LS(u, v).
9. S = {1, x, x2 , . . .} ⊆ C[x].
Lemma 3.2.9. Let V be a vector space over F with S ⊆ V. Then LS(S) is a subspace of V.
Theorem 3.2.11. Let V be a vector space over F and S ⊆ V. Then LS(S) is the smallest
subspace of V containing S.
Proof. For every u ∈ S, u = 1 · u ∈ LS(S). Thus, S ⊆ LS(S). Need to show that LS(S) is the
smallest subspace of V containing S. So, let W be any subspace of V containing S. Then, by
Exercise 3.2.10, LS(S) ⊆ W and hence the result follows.
Definition 3.2.12. Let V be a vector space over F and S, T be two subsets of V. Then, the
sum of S and T , denoted S + T equals {s + t| s ∈ S, t ∈ T }.
Example 3.2.13.
1. If V = R, S = {0, 1, 2, 3, 4, 5, 6} and T = {5, 10, 15} then S + T = {5, 6, . . . , 21}.
(" #) (" #) (" #)
1 −1 0
2. If V = R2 , S = and T = then S + T = .
1 1 2
76 CHAPTER 3. VECTOR SPACES
(" #) " #! (" # " # )
1 −1 1 −1
3. If V = R2 , S= and T = LS then S + T = +c |c∈R .
1 1 1 1
Lemma 3.2.15. Let P and Q be two subspaces of a vector space V over F. Then P + Q is a
subspace of V. Furthermore, P + Q is the smallest subspace of V containing both P and Q.
5. Let S = {x1 , x2 , x3 , x4 }, where x1 = (1, 0, 0)T , x2 = (1, 1, 0)T , x3 = (1, 2, 0)T and x4 =
(1, 1, 1)T . Then, determine all xi such that LS(S) = LS(S \ {xi }).
6. Let W = LS((1, 0, 0)T , (1, 1, 0)T ) and U = LS((1, 1, 1)T ). Prove that W + U = R3 and
W ∩ U = {0}. If v ∈ R3 , determine w ∈ W and u ∈ U such that v = w + u. Is it
necessary that w and u are unique?
7. Let W = LS((1, −1, 0), (1, 1, 0)) and U = LS((1, 1, 1), (1, 2, 1)). Prove that W + U = R3
and W ∩ U 6= {0}. Find v ∈ R3 such that v = w + u, for 2 different choices of w ∈ W
and u ∈ U. Thus, the choice of vectors w and u is not unique.
8. Let S = {(1, 1, 1, 1)T , (1, −1, 1, 2)T , (1, 1, −1, 1)T } ⊆ R4 . Does (1, 1, 2, 1)T ∈ LS(S)? Fur-
thermore, determine conditions on x, y, z and u such that (x, y, z, u)T ∈ LS(S).
Example 3.3.1.
1 1
1. Let A = 1 2 . Then Ax = 0 has only the trivial solution. So, we say that the columns
1 3
1 1
of A are linearly independent. Thus, the set S = 1, 2 , consisting of columns of
1
3
A, is linearly independent.
1 1 2 1 1 2
2. Let A = 1 2 3. As REF (A) = 0 1 1, Ax = 0 has only the trivial solution.
1 3 5 0 0 1
1 1 2
Hence, the set S = , ,
1 2 3, consisting of columns of A, is linearly independent.
1
3 5
1 1 2 1 1 2
3. Let A = 1 2 3. As REF (A) = 0 1
, Ax = 0 has a non-trivial solution. Hence,
1
1 3 4 0 0 0
1 1 2
T
the set S = , ,
1 2 3, consisting
of columns of A, is linearly dependent.
AF
1
3 4
DR
α1 u1 + α2 u2 + · · · + αm um = 0, (3.3.1)
in the unknowns αi ’s, 1 ≤ i ≤ m, has only the trivial solution. If Equation (3.3.1) has a
non-trivial solution then S is said to be linearly dependent. If S has infinitely many vectors
then S is said to be linearly independent if for every finite subset T of S, T is linearly
independent.
Observe that we are solving a linear system over F. Hence, whether a set is linearly inde-
pendent or linearly dependent depends on the set of scalars.
Example 3.3.3.
1. Consider C2 as a vector space over R. Let S = {(1, 2)T , (i, 2i)T }. Then, the linear system
a · (1, 2)T + b · (i, 2i)T = (0, 0)T , in the unknowns a, b ∈ R has only the trivial solution,
namely a = b = 0. So, S is a linear independent subset of the vector space C2 over R.
2. Consider C2 as a vector space over C. Then S = {(1, 2)T , (i, 2i)T } is a linear dependent
subset of the vector space C2 over C as a = −i and b = 1 is a non-trivial solution.
78 CHAPTER 3. VECTOR SPACES
3. Let V be the vector space of all real valued continuous functions with domain [−π, π].
Then V is a vector space over R. Question: What can you say about the linear indepen-
dence or dependence of the set S = {1, sin(x), cos(x)}?
Solution: For all x ∈ [−π, π], consider the system
h ia
1 sin(x) cos(x) b = 0 ⇔ a · 1 + b · sin(x) + c · cos(x) = 0, (3.3.2)
c
in the unknowns a, b and c. Even though we seem to have only one linear system, we we
can obtain the following two linear systems (the first using differentiation and the second
π
using evaluation at 0, and π of the domain).
2
a + b sin x + c cos x =0
a+c =0
0 · a + b cos x − c sin x = 0 or a+b =0
0 · a − b sin x − c cos x = 0 a−c =0
Clearly, the above systems has only the trivial solution. Hence, S is linearly independent.
4. Let A ∈ Mm,n (C). If Rank(A) < m then, the rows of A are linearly dependent. " #
C
Solution: As Rank(A) < m, there exists an invertible matrix P such that P A = .
0
T
m
AF
Thus, 0T = (P A)[m, :] =
P
pmi A[i, :]. As P is invertible, at least one pmi 6= 0. Thus, the
i=1
required result follows.
DR
5. Let A ∈ Mm,n (C). If Rank(A) < n then, the columns of A are linearly dependent.
Solution: As Rank(A) < n the system Ax = 0 has a non-trivial solution.
6. Let S = {0}. Is S linearly independent?
Solution: Let u = 0. So, consider the system αu = 0. This has a non-trivial solution
α = 1 as 1 · 0 = 0.
(" # " #) " #
0 1 0 1
7. Let S = , . Then Ax = 0 corresponds to A = . This has a non-trivial
0 2 0 2
" #
1
solution x = . Hence, S is linearly dependent.
0
(" #)
1
8. Let S = . Is S linearly independent?
2
" #
1
Solution: Let u = . Then the system αu = 0 has only the trivial solution. Hence S
2
is linearly independent.
So, we observe that 0, the zero-vector cannot belong to any linearly independent set. Fur-
ther, a set consisting of a single non-zero vector is linearly independent.
Exercise 3.3.4. 1. Show that S = {(1, 2, 3)T , (−2, 1, 1)T , (8, 6, 10)T } ⊆ R3 is linearly de-
pendent.
3.3. LINEAR INDEPENDENCE 79
2. Let A ∈ Mn (R). Suppose x, y ∈ Rn \ {0} such that Ax = 3x and Ay = 2y. Then, prove
that x and y are linearly independent.
2 1 3
. Determine x, y, z ∈ R3 \ {0} such that Ax = 6x, Ay = 2y and
3. Let A = 4 −1 3
3 −2 5
Az = −2z. Use the vectors x, y and z obtained above to prove the following.
We now prove a couple of results which will be very useful in the next section.
Lemma 3.3.7. Let S be a linearly independent subset of a vector space V over F. Then, each
v ∈ LS(S) is a unique linear combination of vectors from S.
Proof. Suppose there exists v ∈ LS(S) with v ∈ LS(T1 ), LS(T2 ) with T1 , T2 ⊆ S. Let T1 =
{v1 , . . . , vk } and T2 = {w1 , . . . , w` }, for some vi ’s and wj ’s in S. Define T = T1 ∪ T2 . Then,
T is a subset of S. Hence, using Proposition 3.3.5, the set T is linearly independent. Let T =
80 CHAPTER 3. VECTOR SPACES
{u1 , . . . , up }. Then, there exist αi ’s and βj ’s in F, not all zero, such that v = α1 u1 + · · · + αp up
as well as v = β1 u1 + · · · + βp up . Equating the two expressions for v gives
So,
w1 a11 u1 + · · · + a1k uk a11 · · · a1k u1
. . . . . .
. = .. = . .. .. .. .
. .
wm am1 u1 + · · · + amk uk am1 · · · amk uk
As m > k, the homogeneous system AT x = 0 has a non-trivial solution, say y 6= 0, i.e.,
T
AT y = 0 ⇔ yT A = 0T . Thus,
AF
w1 u1 u1 u1
DR
. .
yT T A . = (yT A) . = 0T .. = 0T .
.
.
. =y . . .
wm uk uk uk
Proof. Observe that Rn = LS({e1 , . . . , en }), where ei = In [:, i], is the i-th column of In . Hence,
using Theorem 3.3.8, the required result follows.
Theorem 3.3.10. Let S be a linearly independent subset of a vector space V over F. Then, for
any v ∈ V the set S ∪ {v} is linearly dependent if and only if v ∈ LS(S).
Proof. Let us assume that S ∪ {v} is linearly dependent. Then, there exist vi ’s in S such that
the linear system
α1 v1 + · · · + αp vp + αp+1 v = 0 (3.3.4)
We now state a very important corollary of Theorem 3.3.10 without proof. This result can
also be used as an alternative definition of linear independence and dependence.
Corollary 3.3.11. Let V be a vector space over F and let S be a subset of V containing a
non-zero vector u1 .
1. If S is linearly dependent then, there exists k such that LS(u1 , . . . , uk ) = LS(u1 , . . . , uk−1 ).
Or equivalently, if S is a linearly dependent set then there exists a vector uk , for k ≥ 2,
which is a linear combination of the previous vectors.
As an application, we have the following result about finite dimensional vector spaces. We
leave the proof for the reader as it directly follows from Corollary 3.3.11 and the idea that an
DR
In this subsection, we use the understanding of vector spaces to relate the rank of a matrix
with linear independence and dependence of rows and columns of a matrix. We start with our
understanding of the RREF.
every row of A is a linear combination of the r-rows of B. Hence, using Theorem 3.3.8 any
AF
due to pivotal 1’s. As B = RREF(A), there exists an invertible matrix P such that B = P A.
Then, the corresponding columns of A satisfy
3. Also, note that during the application of GJE, the 3-rd and 4-th rows were interchanged.
Hence, the rows A[1, :], A[2, :] and A[4, :] are linearly independent.
(b) If S is linearly independent then prove that T is linearly independent for every in-
vertible matrix A.
DR
(c) If T is linearly independent then S is linearly independent. Further, in this case, the
matrix A is necessarily invertible.
Example 3.4.2. Let T = {2, 3, 4, 7, 8, 10, 12, 13, 14, 15}. Then, a maximal subset of T of
consecutive integers is S = {2, 3, 4}. Other maximal subsets are {7, 8}, {10} and {12, 13, 14, 15}.
Note that {12, 13} is not maximal. Why?
Definition 3.4.3. Let V be a vector space over F. Then, S is called a maximal linearly
independent subset of V if
1. S is linearly independent and
2. no proper superset of S in V is linearly independent.
Example 3.4.4.
84 CHAPTER 3. VECTOR SPACES
1. In R3 , the set S = {e1 , e2 } is linearly independent but not maximal as S ∪ {(1, 1, 1)T } is
a linearly independent set containing S.
2. In R3 , S = {(1, 0, 0)T , (1, 1, 0)T , (1, 1, −1)T } is a maximal linearly independent set as S is
linearly independent and any collection of 4 or more vectors from R3 is linearly dependent
(see Corollary 3.3.9).
3. Let S = {v1 , . . . , vk } ⊆ Rn . Now, form the matrix A = [v1 , . . . , vk ] and let B =
RREF(A). Then, using Theorem 3.3.14, we see that if B[:, i1 ], . . . , B[:, ir ] are the piv-
otal columns of B then {vi1 , . . . , vir } is a maximal linearly independent subset of S.
4. Is the set {1, x, x2 , . . .} a maximal linearly independent subset of C[x] over C?
5. Is the set {eij | 1 ≤ i ≤ m, 1 ≤ j ≤ n} a maximal linearly independent subset of Mm,n (C)
over C?
Theorem 3.4.5. Let V be a vector space over F and S a linearly independent set in V. Then,
S is maximal linearly independent if and only if LS(S) = V.
Proof. Let v ∈ V. As S is linearly independent, using Corollary 3.3.11.2, the set S ∪ {v} is
linearly independent if and only if v ∈ V \ LS(S). Thus, the required result follows.
Let V = LS(S) for some set S with | S | = k. Then, using Theorem 3.3.8, we see that if
T ⊆ V is linearly independent then | T | ≤ k. Hence, a maximal linearly independent subset
T
of V can have at most k vectors. Thus, we arrive at the following important result.
AF
Theorem 3.4.6. Let V be a vector space over F and let S and T be two finite maximal linearly
DR
Proof. By Theorem 3.4.5, S and T are maximal linearly independent if and only if LS(S) =
V = LS(T ). Now, use the previous paragraph to get the required result.
Let V be a finite dimensional vector space. Then, by Theorem 3.4.6, the number of vectors
in any two maximal linearly independent set is the same. We use this number to now define
the dimension of a vector space.
Definition 3.4.7. Let V be a finite dimensional vector space over F. Then, the number of
vectors in any maximal linearly independent set is called the dimension of V, denoted dim(V).
By convention, dim({0}) = 0.
Example 3.4.8.
1. As {1} is a maximal linearly independent subset of R, dim(R) = 1.
2. As {e1 , . . . , en } is a maximal linearly independent subset in Rn , dim(Rn ) = n.
3. As {e1 , . . . , en } is a maximal linearly independent subset in Cn over C, dim(Cn ) = n.
4. Using Exercise 3.3.13.4, {e1 , . . . , en , ie1 , . . . , ien } is a maximal linearly independent subset
in Cn over R. Thus, as a real vector space, dim(Cn ) = 2n.
5. As {eij | 1 ≤ i ≤ m, 1 ≤ j ≤ n} is a maximal linearly independent subset of Mm,n (C) over
C, dim(Mm,n (C)) = mn.
3.4. BASIS OF A VECTOR SPACE 85
Definition 3.4.9. Let V be a finite dimensional vector space over F. Then, a maximal linearly
independent subset of V is called a basis of V. The vectors in a basis are called basis vectors.
By convention, a basis of {0} is the empty set.
Thus, using Theorem 3.3.12 we see that every finite dimensional vector space has a basis.
Remark 3.4.10 (Standard Basis). The readers should verify the statements given below.
1. All the maximal linearly independent set given in Example 3.4.8 form the standard basis
of the respective vector space.
3. Fix a positive integer n. Then {1, x, x2 , . . . , xn } is the standard basis of R[x; n] over R.
5. Let V = {A ∈ Mn (R) | AT = −A}. Then, V is a vector space over R with standard basis
{eij − eji | 1 ≤ i < j ≤ n}.
Definition 3.4.11. Let V be a vector space over F. Then, a subset S of V is called minimal
spanning if LS(S) = V and no proper subset of S spans V.
T
AF
Example 3.4.12.
DR
5. Let S = {a1 , . . . , an }. Then, RS is a real vector space (see Example 3.1.4.7). For 1 ≤ i ≤ n,
define the functions (
1 if j = i
ei (aj ) = .
0 otherwise
Theorem 3.4.13. Let V be a non-zero vector space over F. Then, the following statements are
equivalent.
1. B is a basis (maximal linearly independent subset) of V.
2. B is linearly independent and spans V.
3. B is a minimal spanning set in V.
Remark 3.4.14. Let B be a basis of a vector space V over F. Then, for each v ∈ V, there exist
n
T
i=1
DR
The next result is generally known as “every linearly independent set can be extended to
form a basis of a finite dimensional vector space”. Also, recall Theorem 3.3.12.
Theorem 3.4.15. Let V be a vector space over F with dim(V) = n. If S is a linearly independent
subset of V then there exists a basis T of V such that S ⊆ T .
Proof. If LS(S) = V, done. Else, choose u1 ∈ V \ LS(S). Thus, by Corollary 3.3.11.2, the set
S ∪{u1 } is linearly independent. We repeat this process till we get n vectors in T as dim(V) = n.
By Theorem 3.4.13, this T is indeed a required basis.
We end this section with an algorithm which is based on the proof of the previous theorem.
4. Let {v1 , . . . , vn } be a basis of Cn . Then, prove that the two matrices B = [v1 , . . . , vn ] and
v1T
.
C= .
. are invertible.
vnT
5. Let W1 and W2 be two subspaces of a finite dimensional vector space V such that W1 ⊆ W2 .
Then, prove that W1 = W2 if and only if dim(W1 ) = dim(W2 ).
6. Let W1 be a subspace of a finite dimensional vector space V over F. Then, prove that
T
Also, prove that for each v ∈ V there exist unique vectors w1 ∈ W1 and w2 ∈ W2 with
v = w1 + w2 . The subspace W2 is called the complementary subspace of W1 in V and
we write V = W1 ⊕ W2 .
7. Let V be a finite dimensional vector space over F. If W1 and W2 are two subspaces of V
such that W1 ∩W2 = {0} and dim(W1 )+dim(W2 ) = dim(V) then prove that W1 +W2 = V.
8. Consider the vector space C([−π, π]) over R. For each n ∈ N, define en (x) = sin(nx).
Then, prove that S = {en | n ∈ N} is linearly independent. [Hint: Need to show that every
finite subset of S is linearly independent. So, on the contrary assume that there exists ` ∈ N and
functions ek1 , . . . , ek` such that α1 ek1 + · · · + α` ek` = 0, for some αt 6= 0 with 1 ≤ t ≤ `. But,
the above system is equivalent to looking at α1 sin(k1 x) + · · · + α` sin(k` x) = 0 for all x ∈ [−π, π].
Now in the integral
Z π Z π
sin(mx) (α1 sin(k1 x) + · · · + α` sin(k` x)) dx = sin(mx)0 dx = 0
−π −π
replace m with ki ’s to show that αi = 0, for all i, 1 ≤ i ≤ `. This gives the required contradiction.]
9. Is the set {1, sin(x), cos(x), sin(2x), cos(2x), sin(3x), cos(3x), . . .} a linearly subset of the
vector space C([−π, π], R) over R?
88 CHAPTER 3. VECTOR SPACES
10. Find a basis of R3 containing the vector (1, 1, −2)T and (1, 2, −1)T .
14. Let uT = (1, 1, −2), vT = (−1, 2, 3) and wT = (1, 10, 1). Find a basis of LS(u, v, w).
Determine a geometrical representation of LS(u, v, w).
15. Is the set W = {p(x) ∈ R[x; 4] | p(−1) = p(1) = 0} a subspace of R[x; 4]? If yes, find its
dimension.
Definition 3.5.1. Let A ∈ Mm,n (R). Then, we define the four fundamental subspaces associ-
DR
ated with A as
1. Col(A) = {Ax | x ∈ Rn } is a subspace of Rm , called the Column space, and is the
linear span of the columns of A.
2. Row(A) = Col(AT ) = {AT x | x ∈ Rm } is a subspace of Rn , called the row space of A
and is the linear span of the rows of A.
3. Null(A) = {x ∈ Rn | Ax = 0}, called the Null space of A.
4. Null(AT ) = {x ∈ Rm | AT x = 0}, also called the left-null space.
Example 3.5.3.
1 1 1 −2
1. Compute the fundamental subspaces for A =
1 2 −1 .
1
1 −2 7 −11
Solution: Verify the following
Remark 3.5.4. Let A ∈ Mm,n (R). Then, in Example 3.5.3, observe that the direction ratios
of normal vectors of Col(A) matches with vector in Null(AT ). Similarly, the direction ratios
of normal vectors of Row(A) matches with vectors in Null(A). Are these true in the general
setting?
Exercise 3.5.5. 1. For the matrices given below, determine the four fundamental spaces.
Further,
find the dimensions
of all
the vector subspacesso obtained.
1 2 1 3 2 2 4 0 6
and B = −1 0 −2 5 .
0 2 2 2 4
T
A=
AF
2
−2 4 0 8
−3 −5 1 −4
4 2 5 6 10 −1 −1 1 2
DR
The next result is a re-writing of the results on system of linear equations. The readers are
advised to provide the proof for clarity.
Let W1 and W1 be two subspaces of a vector space V over F. Then, recall that (see
Exercise 3.2.16.4d) W1 + W2 = {u + v | u ∈ W1 , v ∈ W2 } = LS(W1 ∪ W2 ) is the smallest
subspace of V containing both W1 and W2 . We now state a result similar to a result in Venn
diagram that states | A | + | B | = | A ∪ B | + | A ∩ B |, whenever the sets A and B are
finite (for a proof, see Appendix 9.4.1).
Theorem 3.5.7. Let V be a finite dimensional vector space over F. If W1 and W2 are two
subspaces of V then
For better understanding, we give an example for finite subsets of Rn . The example uses
Theorem 3.3.14 to obtain bases of LS(S), for different choices S. The readers are advised to
see Example 3.3.14 before proceeding further.
1 2 0 1 2 1 2 0 1 2
1 2 0 1 2 1 2 0 1 2
DR
0 0 1 0 −1 0 0 1 0 −1
−1
0 0 1 0 −1
0
0 1 0 −1
0 1 0 0
0 1 0 0 0
→ 1 0 0 1 0 → 0 1 0 0 1 .
0 1 0 0 0
0 0 0 1 3
0 1 1 0 0 0 0 0 0 0
3 0 0 1 0 0 0 0 0 0
0 1 0 0 1 0 0 0 0 0
−1 0 0 0 1 0 0 0 0 0
Thus, a required basis of V is {(1, 2, 0, 1, 2)T , (0, 0, 1, 0, −1)T , (0, 1, 0, 0, 0)T , (0, 0, 0, 1, 3)T }. Sim-
ilarly, a required basis of W is {(1, 2, 0, 1, 2)T , (0, 0, 1, 0, −1)T , (0, 1, 0, 0, 1)T }.
Exercise 3.5.9. 1. Give an example to show that if A and B are equivalent then Col(A)
need not equal Col(B).
3. Let W1 and W2 be two subspaces of a vector space V. If dim(W1 ) + dim(W2 ) > dim(V),
then prove that dim(W1 ∩ W2 ) ≥ 1.
So, D = {Aur+1 , . . . , Aun } spans Col(A). We further need to show that D is linearly indepen-
dent. So, consider the homogeneous linear system given below in the unknowns α1 , . . . , αn−r .
In other words, we have shown that the only solution of Equation (3.6.3) is the trivial solution.
Hence, {Aur+1 , . . . , Aun } is a basis of Col(A). Thus, the required result follows.
Theorem 3.6.1 is part of what is known as the fundamental theorem of linear algebra (see
Theorem 3.6.5). The following are some of the consequences of the rank-nullity theorem. The
proofs are left as an exercise for the reader.
(f ) There exists a linearly independent subset {b1 , . . . , bk } of Rm such that the system
Ax = bi , for 1 ≤ i ≤ k, is consistent.
(g) dim(Null(A)) = n − k.
3. Let A ∈ Mn (R) and define a function f : Rn → Rn by f (x) = Ax. Then, the following
statements are equivalent.
(a) f is one-one.
(b) f is onto.
(c) f is invertible.
" #
1 −1
4. Let A = . Then, verify that Null(A) = Col(A). Can such examples exist in Rn
1 −1
for n odd? What about n even? Further, verify that R2 6= Null(A) + Col(A). Does it
contradict the rank-nullity theorem?
We end this section by proving the fundamental theorem of linear algebra. We start with
the following result.
So, let x ∈ Null(AT A). Then, (AT A)x = 0 implies (Ax)T (Ax) = xT AT Ax = xT 0 = 0.
DR
Lemma 3.6.4. Consider the vector space Rn . Then, for S ⊆ Rn prove that
1. S ⊥ is a subspace of Rn .
2. S ⊥ = (LS(S))⊥ .
3. (S ⊥ )⊥ = S ⊥ if and only if S is a subspace of Rn .
4. Let W be a subspace of Rn . Then, there exists a subspace V of Rn such that
(a) Rn = W ⊕ V. Or equivalently, W and V are complementary subspaces.
(b) vT u = 0, for every u ∈ W and v ∈ V. This, further implies that W and V are also
orthogonal to each other. Such spaces are called orthogonal complements.
Theorem 3.6.5 (Fundamental Theorem of Linear Algebra). Let A ∈ Mm,n (R). Then,
1. dim(Null(A)) + dim(Col(A)) = n.
3.6. FUNDAMENTAL THEOREM OF LINEAR ALGEBRA AND APPLICATIONS 93
⊥ ⊥
2. Null(A) = Col(AT ) and Null(AT ) = Col(A) .
0 = (AT y)T z = yT Az = yT y ⇔ y = 0.
Thus Az = 0 and z ∈ Null(A). This completes the proof of the first equality in Part 2. A
similar argument gives the second equality.
Part 3: Note that, using the rank-nullity theorem we have
⊥
dim(Col(A)) = n − dim(Null(A)) = n − dim Col(AT ) = n − n − dim Col(AT ) .
T
Remark 3.6.6. Let A ∈ Mm,n (R). Then, Theorem 3.6.5.2 implies the following:
⊥
1. Null(A) = Col(AT ) . This is just stating the usual fact that if x ∈ Null(A) then
Ax = 0. Hence, the dot product of every row of A with x equals 0.
2. Rn = Null(A) ⊕ Col(AT ). Further, Null(A) is orthogonal complement of Col(AT ).
3. Rm = Null(AT ) ⊕ Col(A). Further, Null(AT ) is orthogonal complement of Col(A).
As an implication of last two parts of Theorem 3.6.5, we show the existence of an invertible
function f : Col(AT ) → Col(A).
Corollary 3.6.7. Let A ∈ Mm,n (R). Then, the function f : Col(AT ) → Col(A) defined by
f (x) = Ax is invertible.
Proof. Let us first show that f is one-one. So, let x, y ∈ Col(AT ) such that f (x) = f (y).
Hence, Ax = Ay. Thus x − y ∈ Null(A) = (Col(AT ))⊥ (by Theorem 3.6.5.2). Therefore,
x − y ∈ (Col(AT ))⊥ ∩ Col(AT ) = {0}. Thus x = y and hence f is one-one.
We now show that f is onto. So, let z ∈ Col(A). To find y ∈ Col(AT ) such that f (y) = z.
As z ∈ Col(A) there exists w ∈ Rn with z = Aw. But Null(A) and Col(AT ) are
complementary subspaces and hence, there exists unique vectors, w1 ∈ Null(A) and w2 ∈
Col(AT ), such that w = w1 + w2 . Thus, z = Aw implies
AX
text
AX
AAtul
feelin
y T
AF
The readers should look at Example 3.5.3 and Remark 3.5.4. We give one more example.
1 1 0
Example 3.6.8. Let A = 2 1 1. Then, verify that
3 2 1
1. {(0, 1, 1)T , (1, 1, 2)T } is a basis of Col(A).
2. {(1, 1, −1)T } is a basis of Null(AT ).
3. Null(AT ) = (Col(A))⊥ .
For more information related with the fundamental theorem of linear algebra the interested
readers are advised to see the article “The Fundamental Theorem of Linear Algebra, Gilbert
Strang, The American Mathematical Monthly, Vol. 100, No. 9, Nov., 1993, pp. 848 - 855.” The
diagram 3.6 has been taken from the above paper. It also explains Corollary 3.6.7.
Exercise 3.6.9. 1. Find subspaces W1 6= {0} and W2 6= {0} in R3 such that they are
orthogonal but they are not orthogonal complement of each other.
2. Let A ∈ Mm,n (R). Prove that Col(AT ) = Col(AT A). Thus, Rank(A) = n if and only if
Rank(AT A) = n. [ Hint: Use the rank-nullity theorem and/ or Lemma 3.6.3]
3. Let A ∈ Mm,n (R). Then, for every
(a) x ∈ Rn , x = u + v, where u ∈ Col(AT ) and v ∈ Null(A) are unique.
(b) y ∈ Rm , y = w + z, where w ∈ Col(A) and z ∈ Null(AT ) are unique.
3.7. SUMMARY 95
3.7 Summary
In this chapter, we defined vector spaces over F. The set F was either R or C. To define a vector
space, we start with a non-empty set V of vectors and F the set of scalars. We also needed to
do the following:
If all conditions in Definition 3.1.1 are satisfied then V is a vector space over F. If W was a
non-empty subset of a vector space V over F then for W to be a space, we only need to check
whether the vector addition and scalar multiplication inherited from that in V hold in W.
We then learnt linear combination of vectors and the linear span of vectors. It was also shown
T
that the linear span of a subset S of a vector space V is the smallest subspace of V containing
AF
1. A is invertible.
96 CHAPTER 3. VECTOR SPACES
3. RREF(A) = In .
7. Rank(A) = n.
8. det(A) 6= 0.
9. Col(AT ) = Row(A) = Rn .
11. Col(A) = Rn .
The dot product helped us to compute the length of vectors and talk of perpendicularity of
T
vectors. We now generalize the idea of dot product to achieve similar goal for a general vector
DR
space over R or C.
Definition 4.1.1. Let V be a vector space over F. An inner product over V, denoted by
h , i, is a map from V × V to F satisfying
2. hu, vi = hv, ui, the complex conjugate of hu, vi, for all u, v ∈ V and
Remark 4.1.2. Using the definition of inner product, we immediately observe that
1. hu, 0i = hu, 0 + 0i = hu, 0i + hu, 0i. Thus, hu, 0i = 0, for all u ∈ V.
2. hv, α wi = hα w, vi = α hw, vi = α hv, wi, for all α ∈ F and v, w ∈ V.
3. If hu, vi = 0 for all v ∈ V then in particular hu, ui = 0. Hence u = 0.
Definition 4.1.3. Let V be a vector space with an inner product h , i. Then, (V, h , i) is called
an inner product space (in short, ips).
Example 4.1.4. Examples 1 to 4 that appear below are called the standard inner product
or the dot product. Whenever an inner product is not clearly mentioned, it will be assumed
to be the standard inner product.
97
98 CHAPTER 4. INNER PRODUCT SPACES
−1 −1
AF
R1 R1
(d) hαf , gi = (αf (x))g(x)dx = α f (x)g(x)dx = αhf , gi.
DR
−1 −1
" #
4 −1
5. For x = (x1 , x2 )T , y = (y1 , y2 )T ∈ R2 and A = , define hx, yi = yT Ax. Then,
−1 2
h , i is an inner product as hx, xi = (x1 − x2 )2 + 3x21 + x22 .
" #
a b
6. Fix A = with a, c > 0 and ac > b2 . Then, hx, yi = yT Ax is an inner product on
b c
h i2
R2 as hx, xi = ax21 + 2bx1 x2 + cx22 = a x1 + bxa2 + a1 ac − b2 x22 .
Exercise 4.1.5. For x = (x1 , x2 )T , y = (y1 , y2 )T ∈ R2 , we define three maps that satisfy at
least one condition out of the three conditions for an inner product. Determine the condition
which is not satisfied. Give reasons for your answer.
1. hx, yi = x1 y1 .
As hu, ui > 0, for all u 6= 0, we use inner product to define the length/ norm of a vector.
4.2. CAUCHY-SCHWARTZ INEQUALITY 99
Definition 4.1.6. Let V be an inner product space over F. Then, for any vector u ∈ V, we
p
define the length (norm) of u, denoted kuk = hu, ui, the positive square root. A vector of
u
norm 1 is called a unit vector. Thus, is called the unit vector in the direction of u.
kuk
1. Let V be an ips and u ∈ V. Then, for any scalar α, kαuk = α · kuk.
Example 4.1.7.
√ √
2. Let u = (1, −1, 2, −3)T ∈ R4 . Then, kuk = 1 + 1 + 4 + 9 = 15. Thus, √1 u and
15
− √115 u are unit vectors in the direction of u.
Exercise 4.1.8. 1. Let u = (−1, 1, 2, 3, 7)T ∈ R5 . Find all α ∈ R such that kαuk = 1.
a b
that kuk = 1, kvk = 1 and hu, vi = 0? [Hint: Let A = and define hx, yi = yT Ax.
b c
Use given conditions to get a linear system of 3 equations in the variables a, b, c.]
A very useful and a fundamental inequality, commonly called the Cauchy-Schwartz inequality,
DR
Theorem 4.2.1 (Cauchy- Schwartz inequality). Let V be an inner product space over F. Then,
for any u, v ∈ V
| hu, vi | ≤ kuk kvk. (4.2.1)
Moreover, equality holds in Inequality
(4.2.1) if and only if u and v are linearly dependent. In
u u
particular, if u 6= 0 then v = v, .
kuk kuk
Proof. If u = 0 then Inequality (4.2.1) holds. Hence, let u 6= 0. Then, by Definition 4.1.1.3,
hv, ui
hλu + v, λu + vi ≥ 0 for all λ ∈ F and v ∈ V. In particular, for λ = − , we have
kuk2
0 ≤ hλu + v, λu + vi = λλkuk2 + λhu, vi + λhv, ui + kvk2
hv, ui hv, ui 2 hv, ui hv, ui 2 2 | hv, ui |2
= kuk − hu, vi − hv, ui + kvk = kvk − .
kuk2 kuk2 kuk2 kuk2 kuk2
Or, in other words | hv, ui |2 ≤ kuk2 kvk2 and the proof of the inequality is over.
Now, note that equality holds in Inequality (4.2.1) if and only if hλu + v, λu + vi = 0, or
equivalently, λu + v = 0. Hence, u and v are linearly dependent. Moreover,
1 1
Exercise 4.2.3. 1. Let a, b ∈ R with a, b > 0. Then, prove that (a + b) ≥ 4. In +
n n a b
P 1
general, for 1 ≤ i ≤ n, let ai ∈ R with ai > 0. Then ≥ n2 .
P
ai
i=1 i=1 ai
3. Let V be an ips. If u, v ∈ V with kuk = 1, kvk = 1 and hu, vi = 1 then prove that u = αv
for some α ∈ F. Is α = 1?
T
AF
Let V be a real vector space. Then, for u, v ∈ V, the Cauchy-Schwartz inequality implies
hu, vi
that −1 ≤ ≤ 1. This together with the properties of the cosine function is used to
DR
kuk kvk
define the angle between two vectors in a real inner product space.
Definition 4.2.4. Let V be a real vector space. If θ ∈ [0, π] is the angle between u, v ∈ V \ {0}
then we define
hu, vi
cos θ = .
kuk kvk
Example 4.2.5. 1. Take (1, 0)T , (1, 1)T ∈ R2 . Then cos θ = √1 . So θ = π/4.
2
2. Take (1, 1, 0)T , (1, 1, 1)T ∈ R3 . Then, angle between them, say β = cos−1 √2 .
6
4. As hx, yi = hy, xi for any real vector space, the angle between x and y is same as the
angle between y and x.
a
b
A B
c
We will now prove that if A, B and C are the vertices of a triangle (see Figure 4.1) and a, b
b2 + c2 − a2
and c, respectively, are the lengths of the corresponding sides then cos(A) = . This
2bc
in turn implies that the angle between vectors has been rightly defined.
Lemma 4.2.6. Let A, B and C be the vertices of a triangle (see Figure 4.1) with corresponding
side lengths a, b and c, respectively, in a real inner product space V then
b2 + c2 − a2
cos(A) = .
2bc
Proof. Let 0, u and v be the coordinates of the vertices A, B and C, respectively, of the triangle
ABC. Then, AB ~ = u, AC
~ = v and BC ~ = v − u. Thus, we need to prove that
Now, by definition kv−uk2 = kvk2 +kuk2 −2hv, ui and hence kvk2 +kuk2 −kv−uk2 = 2 hu, vi.
As hv, ui = kvk kuk cos(A), the required result follows.
4. kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2 (parallelogram law: the sum of squares of the
lengths of the diagonals of a parallelogram equals twice the sum of squares of the lengths
of its sides).
Solution: Just expand the left hand side to get the required result follows.
Proof. As kxk = kx − y + yk ≤ kx − yk + kyk one has kxk − kyk ≤ kx − yk. Similarly, one
obtains kyk − kxk ≤ ky − xk = kx − yk. Combining the two, the required result follows.
T
Exercise 4.3.4.
3. Let A ∈ Mn (C) satisfy kAxk ≤ kxk for all x ∈ Cn . Then, prove that if α ∈ C with
| α | > 1 then A − αI is invertible.
The next result is stated without proof as the proof is beyond the scope of this book.
Theorem 4.3.5. Let k · k be a norm on a normed linear space V. Then the norm k · k is induced
by some inner product if and only if k · k satisfies the parallelogram law:
kx + yk2 + kx − yk2 = k(1, 1)T k2 + k(1, −1)T k2 = (|1| + |1|)2 + (|1| + | − 1|)2 = 8.
So the parallelogram law fails. Thus, kxk is not induced by any inner product in R2 .
Exercise 4.3.7. Does there exist an inner product in R2 such that kxk = max{|x1 |, |x2 |}?
4.4. ORTHOGONALITY IN INNER PRODUCT SPACE 103
We come back to the study of Inner product spaces the topic which is a building block for most
of the applications. To start with, we give the definition of orthogonality of two vectors.
2. If V is a vector space over R or C then 0 is the only vector that is orthogonal to itself.
3. Let V = R.
(c) Let S be any subset of R containing a non-zero real number. Then S ⊥ = {0}.
T
AF
2x1 − x2
u u x1 + 2x2
x = hx, ui 2
+ x − hx, ui 2
= (1, 2)T + (2, −1)T
kuk kuk 5 5
is a decomposition of x into two vectors, one parallel to u and the other parallel to u⊥ .
7. Let P = (1, 1, 1)T , Q = (2, 1, 3)T and R = (−1, 1, 2)T be three vertices of a triangle in R3 .
Compute the angle between the sides P Q and P R.
Solution: Method 1: Note that P~Q = (2, 1, 3)T − (1, 1, 1)T = (1, 0, 2)T , P~R =
(−2, 0, 1)T and RQ~ = (−3, 0, −1)T . As hP~Q, P~Ri = 0, the angle between the sides
π
P Q and P R is .
2
√ √ √
Method 2: kP Qk = 5, kP Rk = 5 and kQRk = 10. As kQRk2 = kP Qk2 + kP Rk2 ,
π
by Pythagoras theorem, the angle between the sides P Q and P R is .
2
Exercise 4.4.3. 1. Let V be an ips.
(a) If S ⊆ V then S ⊥ is a subspace of V and S ⊥ = (LS(S))⊥ .
(b) Furthermore, if V is finite dimensional then S ⊥ and LS(S) are complementary.
Thus, V = LS(S) ⊕ S ⊥ . Equivalently, hu, wi = 0, for all u ∈ LS(S) and w ∈ S ⊥ .
2. Find v, w ∈ R3 such that v, w and (1, −1, −2)T are mutually orthogonal.
3. Let W = {(x, y, z, w)T ∈ R4 : x + y + z − w = 0}. Find a basis of W⊥ .
4. Determine W⊥ , where W = {A ∈ Mn (R) | AT = A}.
(b) k such that cos−1 (hu, vi) = π/3, where u = (1, −1, 1)T and v = (1, k, 1)T .
(c) vectors v, w ∈ R3 such that v, w and u = (1, 1, 1)T are mutually orthogonal.
DR
6. Consider R3 with the standard inner product. Find the plane containing
(a) (1, 1 − 1) with (a, b, c) 6= 0 as the normal vector.
(b) (2, −2, 1)T and perpendicular to the line ` = {(t − 1, 3t + 2, t + 1) : t ∈ R}.
(c) the lines (1, 2, −2) + t(1, 1, 0) and (1, 2, −2) + t(0, 1, 2).
(d) (1, 1, 2)T and orthogonal to the line `{(2 + t, 3, 1 − t) : t ∈ R}.
7. Let P = (3, 0, 2)T , Q = (1, 2, −1)T and R = (2, −1, 1)T be three points in R3 . Then,
(a) find the area of the triangle with vertices P, Q and R.
(b) find the area of the parallelogram built on vectors P~Q and QR.
~
(c) find a non-zero vector orthogonal to the plane of the above triangle.
√
(d) find all vectors x orthogonal to P~Q and QR~ with kxk = 2.
(e) the volume of the parallelepiped built on vectors P~Q and QR
~ and x, where x is one
of the vectors found in Part (d). Do you think the volume would be different if you
choose the other vector x?
8. Let p1 be a plane containing the point A = (1, 2, 3)T and the vector (2, −1, 1)T as its
normal. Then,
(a) find the equation of the plane p2 that is parallel to p1 and contains (−1, 2, −3)T .
4.4. ORTHOGONALITY IN INNER PRODUCT SPACE 105
9. In the parallelogram ABCD, ABkDC and ADkBC and A = (−2, 1, 3)T , B = (−1, 2, 2)T
and C = (−3, 1, 5)T . Find the
1 1 1 1
DR
(1, 0)T , (0, 1)T , √ (1, 1)T , √ (1, −1)T and √ (2, 1)T , √ (1, −2)T .
2 2 5 5
Rπ
4. Recall that hf (x), g(x)i = f (x)g(x)dx defines the standard inner product in C[−π, π].
−π
Consider S = {1} ∪ {em | m ≥ 1} ∪ {fn | n ≥ 1}, where 1(x) = 1, em (x) = cos(mx) and
fn (x) = sin(nx), for all m, n ≥ 1 and for all x ∈ [−π, π]. Then,
(a) S is a linearly independent set.
(b) k1k2 = 2π, kem k2 = π and kfn k2 = π.
(c) the functions in S are orthogonal.
1 1 1
Hence, √ ∪ √ em | m ≥ 1 ∪ √ fn | n ≥ 1 is an orthonormal set in C[−π, π].
2π π π
We now prove the most important initial result of this section.
Hence ci = 0, for 1 ≤ i ≤ n. Thus, the above linear system has only the trivial solution. So,
the set S is linearly independent.
DR
n
P n
P
Part 2: Note that hv, ui i = h αj uj , ui i = αj huj , ui i = αi hui , ui i = αi . This completes
j=1 j=1
Sub-part (a). For Sub-part (b), we have
n
* n n
+ n
* n
+
X X X X X
2 2
kvk = k α i ui k = αi ui , αi ui = αi ui , αj uj
i=1 i=1 i=1 i=1 j=1
n
X n
X n
X n
X
= αi αj hui , uj i = αi αi hui , ui i = | αi |2 .
i=1 j=1 i=1 i=1
n
Furthermore, if x = y then kxk2 = | hx, vi i |2 (generalizing the Pythagoras Theorem).
P
i=1
We have another corollary of Theorem 4.4.6 which talks about an orthogonal set.
Theorem 4.4.8 (Bessel’s Inequality). Let V be an ips with {v1 , · · · , vn } as an orthogonal set.
n | hz, v i |2 n hz, v i
k k
Then, for each z ∈ V, 2 . Equality holds if and only if z =
P P
2
≤ kzk v .
2 k
k=1 kvk k k=1 kvk k
vk
Proof. For 1 ≤ k ≤ n, define uk = and use Theorem 4.4.6.4 to get the required result.
kvk k
Remark 4.4.9. Using Theorem 4.4.6, we see that if B = v1 , . . . , vn is an ordered orthonormal
basis of an ips V then
hu, v1 i
T
.
AF
.
. ,
[u]B = for each u ∈ V.
DR
hu, vn i
Thus, to get the coordinates of a vector with respect to an orthonormal ordered basis, we just
need to compute the inner product with basis vectors.
To proceed further with the applications of the above ideas, we pose a question for better
understanding.
Example 4.4.10. Which point on the plane P is closest to the point, say Q?
P lane − P
0 y
Thus, we see that given u, v ∈ V \ {0}, we need to find two vectors, say y and z, such that
y is parallel to u and z is perpendicular to u. Thus, y = u cos(θ) and z = u sin(θ), where θ is
the angle between u and v.
108 CHAPTER 4. INNER PRODUCT SPACES
R v P
⃗ =v− ⟨v,u⟩ ⃗ =
OQ ⟨v,u⟩
u
OR ∥u∥2
u ∥u∥2
u
Q
O θ
u
We do this as follows (see Figure 4.2). Let û = be the unit vector in the direction
kuk
~
of u. Then, using trigonometry, cos(θ) = kOQk ~
. Hence kOQk = kOP~ k cos(θ). Now using
~ k
kOP
~ hv,ui hv,ui
Definition 4.2.4, kOQk = kvk kvk kuk = kuk , where the absolute value is taken as the
length/norm is a positive quantity. Thus,
~ = kOQk
~ û = u u
OQ v, .
kuk kuk
Hence, y = OQ ~ = v, u u
and z = v − v,
u u
. In literature, the vector y = OQ ~
kuk kuk kuk kuk
is called the orthogonal projection of v on u, denoted Proju (v). Thus,
hv, ui
u u ~
Proju (v) = v, and kProju (v)k = kOQk = . (4.4.2)
kuk kuk kuk
T
AF
Moreover, the distance of u from the point P equals kORk~ = kP~Qk =
u
v − hv, kuk u
i kuk
.
DR
Example 4.4.11. 1. Determine the foot of the perpendicular from the point (1, 2, 3) on the
XY -plane.
Solution: Verify that the required point is (1, 2, 0)?
2. Determine the foot of the perpendicular from the point Q = (1, 2, 3, 4) on the plane
generated by (1, 1, 0, 0), (1, 0, 1, 0) and (0, 1, 1, 1).
Answer: (x, y, z, w) lies on the plane x−y −z +2w = 0 ⇔ h(1, −1, −1, 2), (x, y, z, w)i = 0.
So, the required point equals
1 1
(1, 2, 3, 4) − h(1, 2, 3, 4), √ (1, −1, −1, 2)i √ (1, −1, −1, 2)
7 7
4 1
= (1, 2, 3, 4) − (1, −1, −1, 2) = (3, 18, 25, 20).
7 7
u 1 1 T is parallel to u.
(a) v1 = Proju (v) = hv, ui kuk2 = 4 u = 4 (1, 1, 1, 1)
Note that Proju (w) is parallel to u and Projv2 (w) is parallel to v2 . Hence, we have
u 1 1 T is parallel to u,
(a) w1 = Proju (w) = hw, ui kuk2 = 4 u = 4 (1, 1, 1, 1)
is a set of linearly independent vectors in V then there exists an orthonormal set {w1 , . . . , wn }
AF
Proof. Note that for orthonormality, we need kwi k = 1, for 1 ≤ i ≤ n and hwi , wj i = 0, for
1 ≤ i 6= j ≤ n. Also, by Corollary 3.3.11.2, vi ∈
/ LS(v1 , . . . , vi−1 ), for 2 ≤ i ≤ n, as {v1 , . . . , vn }
is a linearly independent set. We are now ready to prove the result by induction.
v1
Step 1: Define w1 = then LS(v1 ) = LS(w1 ).
kv1 k
u2
Step 2: Define u2 = v2 − hv2 , w1 iw1 . Then, u2 6= 0 as v2 6∈ LS(v1 ). So, let w2 = .
ku2 k
Note that {w1 , w2 } is orthonormal and LS(w1 , w2 ) = LS(v1 , v2 ).
Step 3: For induction, assume that we have obtained an orthonormal set {w1 , . . . , wk−1 } such
that LS(v1 , . . . , vk−1 ) = LS(w1 , . . . , wk−1 ). Now, note that
k−1
P k−1
P
uk = v k − hvk , wi iwi = vk − Projwi (vk ) 6= 0 as vk ∈
/ LS(v1 , . . . , vk−1 ). So, let us put
i=1 i=1
uk
wk = . Then, {w1 , . . . , wk } is orthonormal as kwk k = 1 and
kuk k
k−1
X k−1
X
kuk khwk , w1 i = huk , w1 i = hvk − hvk , wi iwi , w1 i = hvk , w1 i − h hvk , wi iwi , w1 i
i=1 i=1
k−1
X
= hvk , w1 i − hvk , wi ihwi , w1 i = hvk , w1 i − hvk , w1 i = 0.
i=1
k−1
P
As vk = kuk kwk + hvk , wi iwi , we get vk ∈ LS(w1 , . . . , wk ). Hence, by the principle of
i=1
mathematical induction LS(w1 , . . . , wk ) = LS(v1 , . . . , vk ) and the required result follows.
Example 4.5.2. 1. Let S = {(1, −1, 1, 1), (1, 0, 1, 0), (0, 1, 0, 1)} ⊆ R4 . Find an orthonormal
set T such that LS(S) = LS(T ).
Solution: As we just require LS(S) = LS(T ), we can order the vectors as per our
convenience. So, let v1 = (1, 0, 1, 0)T , v2 = (0, 1, 0, 1)T and v3 = (1, −1, 1, 1)T . Then,
w1 = √1 (1, 0, 1, 0)T . As hv2 , w1 i = 0, we get w2 = √1 (0, 1, 0, 1)T . For the third vector,
2 2
let u3 = v3 − hv3 , w1 iw1 − hv3 , w2 iw2 = (0, −1, 0, 1)T . Thus, w3 = √12 (0, −1, 0, 1)T .
h iT h iT h iT h iT
2. Let S = v1 = 2 0 0 , v2 = 3 , v3 = 1 3 , v4 = 1 1 1 . Find
2 2 0 2 2 0
an orthonormal set T such that LS(S) = LS(T ).
h iT
Solution: Take w1 = kvv11 k = 1 0 0 = e1 . For the second vector, consider u2 =
h iT h iT
3 u2
v2 − 2 w1 = 0 2 0 . So, put w2 = ku2 k = 0 1 0 = e2 .
2
hv3 , wi iwi = (0, 0, 0)T . So, v3 ∈ LS((w1 , w2 )). Or
P
For the third vector, let u3 = v3 −
i=1
equivalently, the set {v1 , v2 , v3 } is linearly dependent.
2
T
P
So, for again computing the third vector, define u4 = v4 − hv4 , wi iwi . Then, u4 =
AF
i=1
v4 − w1 − w2 = e3 . So w4 = e3 . Hence, T = {w1 , w2 , w4 } = {e1 , e2 , e3 }.
DR
Observe that (−2, 1, 0) and (−1, 0, 1) are orthogonal to (1, 2, 1) but are themselves not
orthogonal.
Method 1: Apply Gram-Schmidt process to { √16 (1, 2, 1)T , (−2, 1, 0)T , (−1, 0, 1)T } ⊆ R3 .
Method 2: Valid only in R3 using the cross product of two vectors.
−1
In either case, verify that { √16 (1, 2, 1), √ 5
(2, −1, 0), √−1
30
(1, 2, −5)} is the required set.
2. If S is linearly dependent then as in Example 4.5.2.2, there will be stages at which the
vector uk = 0. Thus, we will obtain an orthonormal basis {w1 , . . . , wm } of LS(S), but
note that m < n.
3. a re-arrangement of elements of S then we may obtain another orthonormal basis of
LS(v1 , . . . , vn ). But, observe that the size of the two bases will be the same.
Exercise 4.5.5. 1. Let (V, h , i) be an n-dimensional ips. If u ∈ V with kuk = 1 then give
reasons for the following statements.
Then prove that ATk = Ak and A2k = Ak . Thus, Ak ’s are projection matrices. Further,
AF
5. Determine an orthonormal basis of R4 containing (1, −2, 1, 3)T and (2, 1, −3, 1)T .
6. Let x ∈ Rn with kxk = 1.
(a) Then prove that {x} can be extended to form an orthonormal basis of Rn .
(b) Let the extended basis be {x,x2 , . . . , xn } and B = [e 1 , . . . , en ] the standard ordered
basis of Rn . Prove that A = [x]B , [x2 ]B , . . . , [xn ]B is an orthogonal matrix.
7. Let v, w ∈ Rn , n ≥ 1 with kuk = kwk = 1. Prove that there exists an orthogonal matrix
P such that P v = w. Prove also that A can be chosen such that det(P ) = 1.
4.6 QR Decomposition
In this section, we study the QR-decomposition of a matrix A ∈ Mn (R). The decomposition
is obtained by applying the Gram-Schmidt Orthogonalization process to the columns of the
matrix A. Thus, the set {A[:, 1], . . . , A[:, n]} of the columns of A are taken as the collection of
vectors {v1 , . . . , vn }.
If Rank(A) = n then the columns of A are linearly independent and the application of
the
h Gram-Schmidt
i process gives us vectors {w1 , . . . , wn } ⊆ Rn such that the matrix Q =
w1 · · · wn is an orthogonal matrix. Further the condition
Theorem 4.6.1 (QR Decomposition). Let A ∈ Mn (R) be a matrix with Rank(A) = n. Then,
there exist matrices Q and R such that Q is orthogonal and R is upper triangular with A = QR.
Furthermore, the diagonal entries of R can be chosen to be positive. Also, in this case, the
decomposition is unique.
Proof. The argument before the statement of the theorem gives us A = QR, with
1. Q being an orthogonal matrix (see Exercise 5.8.8.5) and
2. R being an upper triangular matrix.
Thus, this completes the proof of the first part. Note that
T
AF
2. if αii < 0, for some i, 1 ≤ i ≤ n then we can replace vi in Q by −vi to get new matrices
Q and R with the added condition that the diagonal entries of R are positive.
Remark 4.6.2. Note that in the proof of Theorem 4.6.1, we just used the idea that A[:, i] ∈
LS(w1 , . . . , wi ) to get the scalars αji , for 1 ≤ j ≤ i. As {w1 , . . . , wi } is an orthonormal set
So, it is quite easy to compute the entries of the upper triangular matrix R.
Hence, proceeding on the lines of the above theorem, one has the following result.
4.6. QR DECOMPOSITION 113
−1
√ 0 0 √ 0 0 2 0
AF
2 2
1 1 1
0 √ √ 0 0 0 0 √
2
2 2
DR
1 1 1 0
−1 0 −2 1
2. Let A = . Find a 4 × 3 matrix Q satisfying QT Q = I3 and an upper
1 1 1 0
1 0 2 1
triangular matrix R such that A = QR.
Solution: Let us apply the Gram-Schmidt orthonormalization process to the columns of
A. As v1 = (1, −1, 1, 1)T , we get w1 = 21 v1 . Let v2 = (1, 0, 1, 0)T . Then,
1
u2 = v2 − hv2 , w1 iw1 = (1, 0, 1, 0)T − w1 = (1, 1, 1, −1)T .
2
Hence, w2 = 12 (1, 1, 1, −1)T . Let v3 = (1, −2, 1, 2)T . Then,
(a) Rank(A) = 3,
2. By Theorem 4.6.3, there exist matrices Q ∈ Mm,n (R) and R ∈ Mn,n (R) such that A = QR.
w1T
. n
6. Thus P = A(AT A)−1 AT = QQT = [w1 , . . . , wn ] .. = wi wiT is the projection
P
i=1
T
wnT
AF
4.7 Summary
In the previous chapter, we learnt that if V is vector space over F with dim(V) = n then V
basically looks like Fn . Also, any subspace of Fn is either Col(A) or Null(A) or both, for some
matrix A with entries from F.
So, we started this chapter with inner product, a generalization of the dot product in R3
or Rn . We used the inner product to define the length/norm of a vector. The norm has the
property that “the norm of a vector is zero if and only if the vector itself is the zero vector”. We
then proved the Cauchy-Schwartz Inequality which helped us in defining the angle between two
vector. Thus, one can talk of geometrical problems in Rn and proved some geometrical results.
We then independently defined the notion of a norm in Rn and showed that a norm is
induced by an inner product if and only if the norm satisfies the parallelogram law (sum of
squares of the diagonal equals twice the sum of square of the two non-parallel sides).
The next subsection dealt with the fundamental theorem of linear algebra where we showed
that if A ∈ Mm,n (C) then
1. dim(Null(A)) + dim(Col(A)) = n.
⊥ ⊥
2. Null(A) = Col(A∗ ) and Null(A∗ ) = Col(A) .
We then saw that having an orthonormal basis is an asset as determining the coordinates
of a vector boils down to computing the inner product.
So, the question arises, how do we compute an orthonormal basis? This is where we came
across the Gram-Schmidt Orthonormalization process. This algorithm helps us to determine
an orthonormal basis of LS(S) for any finite subset S of a vector space. This also lead to the
QR-decomposition of a matrix.
T
AF
DR
116 CHAPTER 4. INNER PRODUCT SPACES
T
AF
DR
Chapter 5
Linear Transformations
Definition 5.1.1. Let V and W be vector spaces over F with vector operations +, · in V and
⊕, in W. A function (map) f : V → W is called a linear transformation if for all α ∈ F
and u, v ∈ V the function f satisfies
T
AF
By L(V, W), we denote the set of all linear transformations from V to W. In particular, if
W = V then the linear transformation f is called a linear operator and the corresponding set
of linear operators is denoted by L(V).
Even though, in the definition above, we have differentiated between the vector addition
and scalar multiplication for domain and co-domain, we will not differentiate them in the book
unless necessary.
Equation (5.1.1) just states that the two operations, namely, taking the image (apply f ) and
doing ‘vector space operations (vector addition and scalar multiplication) commute, i.e., first
apply vector operations (u + v or αv) and then look at their images f (u + v) or f (αv)) is same
as first computing the images (f (u), f (v)) and then compute vector operations (f (u) + f (v)
and αf (v)). Or equivalently, we look at only those functions which preserve vector operations.
Definition 5.1.2. Let g, h ∈ L(V, W). Then g and h are said to be equal if g(x) = h(x), for
all x ∈ V.
Example 5.1.3. 1. Let V be a vector space. Then, the maps Id, 0 ∈ L(V), where
(a) Id(v) = v, for all v ∈ V, is commonly called the identity operator.
(b) 0(v) = 0, for all v ∈ V, is commonly called the zero operator.
117
118 CHAPTER 5. LINEAR TRANSFORMATIONS
2. Let V and W be vector spaces over F. Then, 0 ∈ L(V, W), where 0(v) = 0, for all v ∈ V,
is commonly called the zero transformation.
4. Let V, W and Z be vector spaces over F. Then, for any T ∈ L(V, W) and S ∈ L(W, Z),
the map S ◦ T ∈ L(V, Z), defined by (S ◦ T )(v) = S T (v) for all v ∈ V, is called the
5. Fix a ∈ Rn and define f (x) = aT x, for all x ∈ Rn . Then f ∈ L(Rn , R). In particular, if
x = [x1 , . . . , xn ]T then, for all x ∈ Rn ,
n
xi = 1T x is a linear transformation.
P
(a) f (x) =
i=1
(b) fi (x) = xi = eTi x is a linear transformation, for 1 ≤ i ≤ n.
T
7. Fix A ∈ Mm×n (C). Define fA (x) = Ax, for every x ∈ Cn . Then, fA ∈ L(Cn , Cm ). Thus,
for each A ∈ Mm,n (C), there exists a linear transformation fA ∈ L(Cn , Cm ).
10. Is the map T : R[x; n] → R[x; n + 1] defined by T (f (x)) = xf (x), for all f (x) ∈ R[x; n] a
linear transformation?
d
Rx
11. The maps T, S : R[x] → R[x] defined by T (f (x)) = dx f (x) and S(f (x)) = f (t)dt, for all
0
f (x) ∈ R[x] are linear transformations. Is it true that T S = Id? What about ST ?
12. Recall the vector space RN in Example 3.1.4.7. Now, define maps T, S : RN → RN
by T ({a1 , a2 , . . .}) = {0, a1 , a2 , . . .} and S({a1 , a2 , . . .}) = {a2 , a3 , . . .}. Then, T and S,
commonly called the shift operators, are linear operators with exactly one of ST or T S
as the Id map.
5.1. DEFINITIONS AND BASIC PROPERTIES 119
13. Recall the vector space C(R, R) (see Example 3.1.4.9). Define T : C(R, R) → C(R, R) by
Rx Rx
T (f )(x) = f (t)dt. For example, T (sin)(x) = sin(t)dt = 1−cos(x), for all x ∈ R. Then,
0 0
verify that T is a linear transformation.
Remark 5.1.4. Let A ∈ Mn (C) and define TA : Cn → Cn by TA (x) = Ax, for every x ∈ Cn .
Then, verify that TAk (x) = (TA ◦ TA ◦ · · · ◦ TA )(x) = Ak x, for any positive integer k.
| {z }
k times
Exercise 5.1.5. Fix A ∈ Mn (C). Then, do the following maps define linear transformations?
1. Define f, g : Mn (C) → Mn (C) by f (B) = A∗ B and g(B) = BA, for every B ∈ Mn (C).
2. Define h, t : Mn (C) → C by h(B) = tr(A∗ B) and t(B) = tr(BA), for every B ∈ Mn (C).
We now prove that any linear transformation sends the zero vector to a zero vector.
Proposition 5.1.6. Let T ∈ L(V, W). Suppose that 0V is the zero vector in V and 0W is the
zero vector of W. Then T (0V ) = 0W .
Hence T (0V ) = 0W .
AF
From now on 0 will be used as the zero vector of the domain and co-domain. We now
DR
Definition 5.1.8. Let f ∈ L(V, W). Then the range/ image of f , denoted Rng(f ) or Im(f ),
is given by Rng(f ) = {f (x) : x ∈ V}.
120 CHAPTER 5. LINEAR TRANSFORMATIONS
As an exercise, show that Rng(f ) is a subspace of W. The next result, which is a very
important result, states that a linear transformation is known if we know its image on a basis
of the domain space.
Lemma 5.1.9. Let V and W be vector spaces over F with B = {v1 , v2 , . . .} as a basis of V. If
f ∈ L(V, W) then T is determined if we know the set {f (v1 ), f (v2 ), . . .}, ı.e., if we know the
image of f on the basis vectors of V, or equivalently, Rng(f ) = LS(f (x)|x ∈ B).
Proof. Let B be a basis of V over F. Then, for each v ∈ V, there exist vectors u1 , . . . , uk in B
k
and scalars c1 , . . . , ck ∈ F such that v =
P
ci ui . Thus
i=1
k k k
!
X X X
T (v) = f ci ui = f (ci ui ) = ci T (ui ).
i=1 i=1 i=1
Or equivalently, whenever
c1 c
. i 1
..
h
v = [u1 , . . . , uk ] .
. then f (v) = f (u1 ) · · · f (uk ) . . (5.1.2)
ck ck
Thus, the image of f on v just depends on where the basis vectors are mapped. Equation 5.1.2
T
Rng(f ) = LS(f (e1 ), T (e2 ), T (e3 )) = LS (1, 0, 1, 2)T , (−1, 1, 0, −5)T , (1, −1, 0, 5)T
= LS (1, 0, 1, 2)T , (1, −1, 0, 5)T = {λ(1, 0, 1, 2)T + β(1, −1, 0, 5)T | λ, β ∈ R}
2. Let B ∈ M2 (R). Now, define a map T : M2 (R) → M2 (R) by T (A) = BA − AB, for all
A ∈ M2 (R). Determine Rng(T ) and Null(T ).
Solution: Recall that {eij |1 ≤ i, j ≤ 2} is a basis of M2 (R). So,
(a) if B = cI2 then Rng(T ) = {0}.
" # " # " # " #
1 2 0 −2 −2 −3 2 0
(b) if B = then T (e11 ) = , T (e12 ) = , T (e21 ) = and
2 4 2 0 0 2 3 −2
" # " # " # " #!
0 2 0 2 2 3 −2 0
T (e22 ) = . Thus, Rng(T ) = LS , , .
−2 0 −2 0 0 −2 −3 2
" # " # " # " #!
1 2 0 2 2 2 −2 0
(c) for B = , verify that Rng(T ) = LS , , .
2 3 −2 0 0 −2 −2 2
5.1. DEFINITIONS AND BASIC PROPERTIES 121
Recall that by Example 5.1.3.5, for each a ∈ Rn , the map T (x) = aT x, for each x ∈ Rn , is
a linear transformation from Rn to R. We now show that these are the only ones.
Corollary 5.1.11. [Reisz Representation Theorem] Let T ∈ L(Rn , R). Then, there exists
a ∈ Rn such that T (x) = aT x.
Proof. By Lemma 5.1.9, T is known if we know the image of T on {e1 , . . . , en }, the standard
basis of Rn . So, for 1 ≤ i ≤ n, let T (ei ) = ai , for some ai ∈ R. Now define a = [a1 , . . . , an ]T
and x = [x1 , . . . , xn ]T ∈ Rn . Then, for all x ∈ Rn ,
n n n
!
X X X
T (x) = T xi ei = xi T (ei ) = xi ai = aT x.
i=1 i=1 i=1
Example 5.1.12. In each of the examples given below, state whether a linear transformation
exists or not. If yes, give at least one linear transformation. If not, then give the condition due
to which a linear transformation doesn’t exist.
1. Can we construct a linear transformation T : R2 → R2 such that T ((1, 1)T ) = (e, 2)T and
T ((2, 1)T ) = (5, 4)T ?
Solution: The first thing that we need to answer is “is the set {(1, 1), (2, 1)} linearly
independent”? The answer is ‘Yes’. So, we can construct it. So, how do we do it?
T
AF
y 1 1 1 1 β
linear transformation
" #! " # " #! " #! " #! " # " #
x 1 2 1 2 e 5
T = T α +β = αT + βT =α +β
y 1 1 1 1 2 4
" #" # " # " #!−1 " #
e 5 α e 5 1 2 x
= =
2 4 β 2 4 1 1 y
" #" #" # " #
e 5 −1 2 x (5 − e)x + (2e − 5)y
= = .
2 4 1 −1 y 2x
2. T : R2 → R2 such that T ((1, 1)T ) = (1, 2)T and T ((1, −1)T ) = (5, 10)T ?" #
1 1
Solution: Yes, as the set {(1, 1), (1, −1)} is a basis of R2 . Write B = . Then,
1 −1
" #! " #! " #!!
x x x
T = T (BB −1 ) = T B B −1
y y y
" " #! " #!# " #−1 " #
1 1 1 1 x
= T , T
1 −1 1 −1 y
" # " #−1 " # " #" # " #
x+y
1 5 1 1 x 1 5 2 3x − 2y
= = = .
2 10 1 −1 y 2 10 x−y 2 6x − 4y
122 CHAPTER 5. LINEAR TRANSFORMATIONS
3. T : R2 → R2 such that T ((1, 1)T ) = (1, 2)T and T ((5, 5)T ) = (5, 11)T ?
Solution: Note that the set {(1, 1), (5, 5)} is linearly dependent. Further, (5, 11)T =
T ((5, 5)T ) = 5T ((1, 1)T )5(1, 2)T = (5, 10)T gives us a contradiction. Hence, there is no
such linear transformation.
4. Does there exist a linear transformation T : R3 → R2 with T (1, 1, 1) = (1, 2), T (1, 2, 3) =
(4, 3) and T (2, 3, 4) = (7, 8)?
Solution: Here, the set {(1, 1, 1), (1, 2, 3), (2, 3, 4)} is linearly dependent and (2, 3, 4) =
(1, 1, 1) + (1, 2, 3). So, we need T ((2, 3, 4)) = T ((1, 1, 1) + (1, 2, 3)) = T ((1, 1, 1)) +
T ((1, 2, 3)) = (1, 2) + (4, 3) = (5, 5). But, we are given T (2, 3, 4) = (7, 8), a contradiction.
So, such a linear transformation doesn’t exist.
5. T : R2 → R2 such that T ((1, 1)T ) = (1, 2)T and T ((5, 5)T ) = (5, 10)T ?
Solution: Yes, as (5, 10)T = T ((5, 5)T ) = 5T ((1, 1)T ) = 5(1, 2)T = (5, 10)T .
y 1 0 y 2 v2 x−y 2
AF
6. Does there exist a linear transformation T : R3 → R2 with T (1, 1, 1) = (1, 2), T (1, 2, 3) =
DR
(a) T 6= 0, T ◦ T = T 2 6= 0, T ◦ T ◦ T = T 3 = 0.
(b) T 6= 0, S 6= 0, S ◦ T = ST 6= 0, T ◦ S = T S = 0.
5.1. DEFINITIONS AND BASIC PROPERTIES 123
(c) S ◦ S = S 2 = T 2 = T ◦ T, S 6= T .
(d) T ◦ T = T 2 = Id, T 6= Id.
5. Let V be a vector space and let a ∈ V. Then the map Ta : V → V defined by Ta (x) = x+a,
for all x ∈ V is called the translation map. Prove that Ta ∈ L(V) if and only if a = 0.
6. Prove that there exists infinitely many linear transformations T : R3 → R2 such that
T ((1, −1, 1)T ) = (1, 2)T and T ((−1, 1, 2)T ) = (1, 0)T ?
(b) T ((1, 0, 1)T ) = (1, 2)T , T ((0, 1, 1)T ) = (1, 0)T and T ((1, 1, 2)T ) = (2, 3)T ?
AF
DR
8. Find T ∈ L(R3 ) for which Rng(T ) = LS (1, 2, 0)T , (0, 1, 1)T , (1, 3, 1)T .
10. Let T : R3 → R3 be defined by T ((x, y, z)T ) = (2x − 2y + 2z, −2x + 5y + 2z, x + y + 4z)T .
Find x ∈ R3 such that T (x) = (1, 1, −1)T .
12. Let T : R3 → R3 be defined by T ((x, y, z)T ) = (2x + 3y + 4z, −y, −3y + 4z)T . Determine
x, y, z ∈ R3 \ {0} such that T (x) = 2x, T (y) = 4y and T (z) = −z. Is the set {x, y, z}
linearly independent?
13. Does there exist a linear transformation T : R3 → Rn such that T ((1, 1, −2)T ) = x,
T ((−1, 2, 3)T ) = y and T ((1, 10, 1)T ) = z
(a) with z = x + y?
(b) with z = cx + dy, for some choice of c, d ∈ R?
14. For each matrix A given below, define T ∈ L(R2 ) by T (x) = Ax. What do these linear
operators signify geometrically?
124 CHAPTER 5. LINEAR TRANSFORMATIONS
( "√ # " # " √ # " # " #)
cos 2π 2π
1 3 −1 1 1 −1 1 1 − 3 0 −1 3 − sin 3
(a) A ∈ √ ,√ , √ , , .
2 1 2 sin 2π 2π
3 2 1 1 3 1 1 0 3 cos 3
( " # " # " # " #)
1 1 −1 1 1 2 0 0 1 0
(b) A ∈ , , , .
2 −1 1 5 2 4 0 1 0 0
( "√ # " # " √ # " #)
cos 2π 2π
1 3 1 1 1 1 1 1 3 3 sin 3
(c) A ∈ √ ,√ , √ , .
2 1 2 1 −1 2 sin 2π 2π
− 3 3 −1 3 − cos 3
15. Consider the space C3 over C. If f ∈ L(C3 ) with f (x) = x, f (y) = (1 + i)y and f (z) =
(2 + 3i)z, for x, y, z ∈ C3 \ {0} then prove that {x, y, z} forms a basis of C3 .
Definition 5.2.1. Let f ∈ L(V, W). Then the null space of f , denoted Null(f ) or Ker(f ),
is given by Null(f ) = {v ∈ V|f (v) = 0}. In most linear algebra books, it is also called the
kernel of f and written Ker(f ). Further, if V is finite dimensional then one writes
T
Then, by definition,
2. Fix B ∈ M2 (R). Now, define T : M2 (R) → M2 (R) by T (A) = BA−AB, for all A ∈ M2 (R).
Solution: Then A ∈ Null(T ) if and only if A commutes with B. In particular,
{I, B, B 2 , . . .} ⊆ Null(T ). For example, if B = αI, for some α then Null(T ) = M2 (R).
2. Define T ∈ L(R2 , R4 ) by T ((x, y)T ) = (x+y, x−y, 2x+y, 3x−4y)T . Determine Null(T ).
3. Describe Null(D) and Rng(D), where D ∈ L(R[x; n]) is defined by (D(f ))(x) = f 0 (x),
the differentiation with respect to x. Note that Rng(D) ⊆ R[x; n − 1].
4. Define T ∈ L(R[x]) by (T (f ))(x) = xf (x), for all f (x) ∈ L(R[x]). What can you say
about Null(T ) and Rng(T )?
5.2. RANK-NULLITY THEOREM 125
Theorem 5.2.4. Let V and W be vector spaces over F and let T ∈ L(V, W).
1. If S ⊆ V is linearly dependent then T (S) = {T (v) | v ∈ V} is linearly dependent.
2. Suppose S ⊆ V such that T (S) is linearly independent then S is linearly independent.
Proof. Part 1 : As S is linearly dependent, there exist k ∈ N and vi ∈ S, for 1 ≤ i ≤ k, such that
k
xi vi = 0, in the unknowns xi ’s, has a non-trivial solution, say xi = ai ∈ F, 1 ≤ i ≤
P
the system
i=1
k
P Pk
k. Thus ai vi = 0. Then ai ’s also give a non-trivial solution to the system yi T (vi ) = 0,
i=1 i=1
k k k
P P P
where yi ’s are unknown, as ai T (vi ) = T (ai vi ) = T ai vi = T (0) = 0. Hence the
i=1 i=1 i=1
required result follows.
Part 2 : On the contrary assume that S is linearly dependent. Then by Part 1, T (S) is
linearly dependent, a contradiction to the given assumption that T (S) is linearly independent.
T
We now prove the rank-nullity Theorem. The proof of this result is similar to the proof of
AF
Theorem 5.2.5 (Rank-Nullity Theorem). Let V and W be vector spaces over F. If dim(V) is
finite and T ∈ L(V, W) then
Rng(T ) = LS(T (v1 ), . . . , T (vk ), T (vk+1 ), . . . , T (vn )) = LS(T (vk+1 ), . . . , T (vn )).
n−k k
Hence, there exists b1 , . . . , bk ∈ F such that
P P
ai vk+i = bj vj . This gives a new system
i=1 j=1
n−k
X k
X
ai vk+i + (−bj )vj = 0,
i=1 j=1
126 CHAPTER 5. LINEAR TRANSFORMATIONS
in the unknowns ai ’s and bj ’s. As C is linearly independent, the new system has only the trivial
n−k
solution, namely [a1 , . . . , ak , −b1 , . . . , −b` ]T = 0. Hence, the system
P
ai T (vk+i ) = 0 has only
i=1
the trivial solution. Thus, the set {T (vk+1 ), . . . , T (vn )} is linearly independent subset of W. It
also spans Rng(T ) and hence is a basis of Rng(T ). Therefore,
Corollary 5.2.6. Let V and W be finite dimensional vector spaces over F and let T ∈ L(V, W).
If dim(V) = dim(W) then the following statements are equivalent.
1. T is one-one.
2. Ker(T ) = {0}.
3. T is onto.
Definition 5.3.1. Let V, W be vector spaces over F and let S, T ∈ L(V, W). Then, we define
the point-wise
Then verify that the above maps correspond to the following collection of matrices?
1 0 0 1 0 0 0 0 0 0 0 0
f11 = 0 0, f12 = 0 0, f21 = 1 0, f22 = 0 1, f31 = 0 0, f32 = 0 0.
0 0 0 0 0 0 0 0 1 0 0 1
Theorem 5.3.2. Let V and W be vector spaces over F. Then L(V, W) is a vector space over
F. Furthermore, if dim V = n and dim W = m, then dim L(V, W) = mn.
Proof. It can be easily verified that under point-wise addition and scalar multiplication, defined
above, L(V, W) is indeed a vector space over F. We now prove the other part. So, let us
assume that B = {v1 , . . . , vn } and C = {w1 , . . . , wm } are bases of V and W, respectively. For
T
(
DR
wj , if k = i
fji (vk ) =
0, if k 6= i.
n
For other vectors of V, we extend the definition by linearity, i.e., if v =
P
αs vs then
s=1
n n
!
X X
fji (v) = fji αs vs = αs fji (vs ) = αi fji (vi ) = αi wj . (5.3.1)
s=1 s=1
Thus fji ∈ L(V, W). We now show that {fji |1 ≤ i ≤ n, 1 ≤ j ≤ m} is a basis of L(V, W).
As a first step, we show that fji ’s are linearly independent. So, consider the linear system
n P
P m
cji fji = 0, in the unknowns cji ’s, for 1 ≤ i ≤ n, 1 ≤ j ≤ m. Using the point-wise addition
i=1 j=1
and scalar multiplication, we get
X n Xm n X
X m m
X
0 = 0(vk ) = cji fji (vk ) = cji fji (vk ) = cjk wj .
i=1 j=1 i=1 j=1 j=1
But, the set {w1 , . . . , wm } is linearly independent. Hence the only solution equals cjk = 0, for
1 ≤ j ≤ m. Now, as we vary vk from v1 to vn , we see that cji = 0, for 1 ≤ j ≤ m and 1 ≤ i ≤ n.
Thus, we have proved the linear independence of {fji |1 ≤ i ≤ n, 1 ≤ j ≤ m}.
Now, let us prove that LS ({fji |1 ≤ i ≤ n, 1 ≤ j ≤ m}) = L(V, W). So, let f ∈ L(V, W).
m
Then, for 1 ≤ s ≤ n, f (vs ) ∈ W and hence there exists βts ’s such that f (vs ) =
P
βts wt . So, if
t=1
128 CHAPTER 5. LINEAR TRANSFORMATIONS
n
αs vs ∈ V then, using Equation (5.3.1), we get
P
v=
s=1
n n n m n X
m
! !
X X X X X
f (v) = f α s vs = αs f (vs ) = αs βts wt = βts (αs wt )
s=1 s=1 s=1 t=1 s=1 t=1
n m n X
m
!
XX X
= βts fts (v) = βts fts (v).
s=1 t=1 s=1 t=1
n P
m
Since the above is true for every v ∈ V, we get f =
P
βts fts . Thus, we conclude that
s=1 t=1
f ∈ LS ({fji |1 ≤ i ≤ n, 1 ≤ j ≤ m}). Hence, LS ({fji |1 ≤ i ≤ n, 1 ≤ j ≤ m}) = L(V, W) and
thus the required result follows.
We now give a corollary of the rank-nullity theorem.
Corollary 5.3.3. Let V be a vector space over F with dim(V) = n. If S, T ∈ L(V) then
Proof. The prove of Part 2 is omitted as it directly follows from Part 1 and Theorem 5.2.5.
Part 1 - Second Inequality: Suppose v ∈ Ker(T ). Then
T
Remark 5.3.5. Let f : S → T be invertible. Then, it can be easily shown that any right inverse
and any left inverse are the same. Thus, the inverse function is unique and is denoted by f −1 .
It is well known that f is invertible if and only if f is both one-one and onto.
5.3. ALGEBRA OF LINEAR TRANSFORMATIONS 129
Lemma 5.3.6. Let V and W be vector spaces over F and let T ∈ L(V, W). If T is one-one and
onto then, the map T −1 : W → V is also a linear transformation. The map T −1 is called the
inverse linear transform of T and is defined by T −1 (w) = v whenever T (v) = w.
Proof. Part 1: As T is one-one and onto, by Theorem 5.2.5, dim(V) = dim(W). So, by
Corollary 5.2.6, for each w ∈ W there exists a unique v ∈ V such that T (v) = w. Thus, one
defines T −1 (w) = v.
We need to show that T −1 (α1 w1 + α2 w2 ) = α1 T −1 (w1 ) + α2 T −1 (w2 ), for all α1 , α2 ∈ F
and w1 , w2 ∈ W. Note that by previous paragraph, there exist unique vectors v1 , v2 ∈ V such
that T −1 (w1 ) = v1 and T −1 (w2 ) = v2 . Or equivalently, T (v1 ) = w1 and T (v2 ) = w2 . So,
T (α1 v1 + α2 v2 ) = α1 w1 + α2 w2 , for all α1 , α2 ∈ F. Hence, for all α1 , α2 ∈ F, we get
Definition 5.3.8. Let V and W be vector spaces over F and let T ∈ L(V, W). Then, T is said
DR
to be singular if {0} $ Ker(T ), i.e., Ker(T ) contains a non-zero vector. If Ker(T ) = {0}
then, T is called non-singular.
" #! x
x
Example 5.3.9. Let T ∈ L(R2 , R3 ) be defined by T
=
y . Then, verify that T is
y
0
non-singular. Is T invertible?
Theorem 5.3.10. Let V and W be vector spaces over F and let T ∈ L(V, W). Then the
following statements are equivalent.
1. T is one-one.
2. T is non-singular.
Proof. 1⇒2 On the contrary, let T be singular. Then, there exists v 6= 0 such that T (v) =
0 = T (0). This implies that T is not one-one, a contradiction.
2⇒3 Let S ⊆ V be linearly independent. Let if possible T (S) be linearly dependent.
k
Then, there exists v1 , . . . , vk ∈ S and α = (α1 , . . . , αk )T 6= 0 such that
P
αi T (vi ) = 0.
i=1
130 CHAPTER 5. LINEAR TRANSFORMATIONS
k k
P P
Thus, T αi vi = 0. But T is non-singular and hence we get αi vi = 0 with α 6= 0, a
i=1 i=1
contradiction to S being a linearly independent set.
3⇒1 Suppose that T is not one-one. Then, there exists x, y ∈ V such that x 6= y but
T (x) = T (y). Thus, we have obtained S = {x − y}, a linearly independent subset of V with
T (S) = {0}, a linearly dependent set. A contradiction to our assumption. Thus, the required
result follows.
Definition 5.3.11. Let V and W be vector spaces over F and let T ∈ L(V, W). Then, T is
said to be an isomorphism if T is one-one and onto. The vector spaces V and W are said to
∼ W, if there is an isomorphism from V to W.
be isomorphic, denoted V =
We now give a formal proof of the statement that every finite dimensional vector space V
over F looks like Fn , where n = dim(V).
As a direct application using the countability argument, one obtains the following result
AF
DR
Corollary 5.3.13. The vector space R over Q is not finite dimensional. Similarly, the vector
space C over Q is not finite dimensional.
We now summarize the different definitions related with a linear operator on a finite dimen-
sional vector space. The proof basically uses the rank-nullity theorem and they appear in some
form in previous results. Hence, we leave the proof for the reader.
Theorem 5.3.14. Let V be a finite dimensional vector space over F with dim V = n. Then the
following statements are equivalent for T ∈ L(V).
1. T is one-one.
2. Ker(T ) = {0}.
3. Rank(T ) = n.
4. T is onto.
5. T is an isomorphism.
7. T is non-singular.
8. T is invertible.
5.4. ORDERED BASES 131
Let V be a vector space of dimension n over F. Then Theorem 5.3.12 implies that V is isomorphic
to Fn . So, one should be able to visualize the elements of V as an n-tuple. Further, our problem
may require us to look at a subspace W of V whose dimension is very small as compared to the
dimension of V (this is generally encountered when we work with sparse matrices or whenever
we do computational work). It may also be possible that a basis of W may not look like a
standard basis of Fn , where the coefficient of ei gave the i-th component of the vector. We start
with the following example. Note that we will be using ‘small brackets’ in place of ‘braces’ to
represent a basis.
Example 5.4.1.
1. Let f (x) =1 −x2 ∈ R[x; 2]. If B = (1, x, x2 ) be a basis of R[x; 2] then, f (x) =
h i 1
2
1 x x 0 .
−1
" #
3 5
AF
So, from Example 5.4.1 we conclude the following: Let V be a vector space of dimension n
n
over F. If we fix a basis, say, B = (u1 , u2 , . . . , un ) of V and if v ∈ V with v =
P
α i ui ⇒
i=1
α1 α2
α2 α1
v = [u1 , u2 , . . . , un ] . = [u2 , u1 , . . . , un ] .
.. ..
αn αn
Note the change in the first two components of the column vectors which are elements of Fn .
So, a change in the position of the vectors ui ’s gives a change in the column vector. Hence,
if we fix the order of the basis vectors ui ’s then with respect to this order all vectors can be
thought of as elements of Fn . We use the above discussion to define an ordered basis.
Definition 5.4.2. Let W be a vector space over F with a basis B = {u1 , . . . , um }. Then, an
ordered basis for W is a basis B together with a one-to-one correspondence between B and
{1, 2, . . . , m}. Since there is an order among the elements of B, we write B = (u1 , . . . , um ). The
matrix B = [u1 , . . . , um ] containing the basis vectors of Wm and is called the basis matrix.
Example 5.4.3. Note that for Example 5.4.1.1 [1, x, x2 ] is a basis matrix, whereas for Exam-
ple 5.4.1.2, [u1 , u2 ] and [u2 , u1 ] are basis matrices.
132 CHAPTER 5. LINEAR TRANSFORMATIONS
Definition 5.4.4. Let B = [v1 , . . . , vm ] be the basis matrix corresponding to an ordered basis
B = (v1 , . . . , vm ) of W. Since B is a basis of W, for each v ∈ W, there exist βi , 1 ≤ i ≤ m,
β1 β1
m . .
. . , denoted [v]B , is called the coordinate
P
such that v = βi vi = B
. . The vector .
i=1
βm βm
vector of v with respect to B. Thus,
v1
.
v = B[v]B = [v1 , . . . , vm ][v]B , or equivalently, v = [v]TB .
. . (5.4.1)
vm
Thus, a little thought implies that Mm,n (R) can be mapped to Rmn with respect to the
ordered basis B = (e11 , . . . , e1n , e21 , . . . , e2n , . . . , em1 , . . . , emn ) of Mm,n (R).
5.4. ORDERED BASES 133
The next definition relates the coordinates of a vector with respect to two distinct ordered
bases. This allows us to move from one ordered basis to another ordered basis.
Definition 5.4.8. Let V be a vector space over F with dim(V) = n. Let A = [v1 , . . . , vn ] and
B = [u1 , . . . , un ] be basis matrices corresponding to the ordered bases A and B, respectively, of
V. Thus, continuing with the symbolic expression in Equation (5.4.1), we have
where [A]B = [[v1 ]B , . . . , [vn ]B ], is called the matrix of A with respect to the ordered basis
B or the change of basis matrix from A to B.
We now summarize the ideas related with ordered bases. This also helps us to understand
the nomenclature ‘change of basis matrix’ for the matrix [A]B .
Theorem 5.4.9. Let V be a vector space over F with dim(V) = n. Further, let A = (v1 , . . . , vn )
and B = (u1 , . . . , un ) be two ordered bases of V.
1. Then the matrix [A]B is invertible. Further, Equation (5.4.2) gives [A]B = B −1 A.
2. Similarly, the matrix [B]A is invertible and [B]A = A−1 B.
3. Moreover, [x]B = [A]B [x]A , for all x ∈ V, i.e., [A]B takes coordinate vector of x with
respect to A to the coordinate vector of x with respect to B.
T
AF
Proof. Part 1: Note that using Equation (5.4.3), we see that the matrix [A]B takes a linearly
independent set to another linearly independent set. Hence, by Exercise 3.3.17, the matrix [A]B
is invertible, which proves Part 1. A similar argument gives Part 2.
Part 3: Using Equation (5.4.2), [x]B = B −1 x = B −1 (AA−1 )x = (B −1 A)(A−1 x) = [A]B [x]A ,
for all x ∈ V. A similar argument gives Part 4 and clearly Part 5.
Example 5.4.10.
1. Let V = Rn , A = [v1 , . . . , vn ] and B = (e1 , . . . , en ) be the standard ordered basis. Then
A = [v1 , . . . , vn ] = [[v1 ]B , . . . , [vn ]B ] = [A]B .
2. Suppose A = (1, 0, 0)T , (1, 1, 0)T , (1, 1, 1)T and B = (1, 1, 1)T , (1, −1, 1)T , (1, 1, 0)T are
two ordered bases of R3 . Then, we verify the statements in the previous result.
−1
x 1 1 1 x x−y
y = 0 1 1 y = y − z .
(a) Using Equation (5.4.2),
z 0 0 1 z z
A
−1
x 1 1 1 x −1 1 2 x −x + y + 2z
1 1
(b) Similarly,
y = 1 −1 1 y = 2 1 −1 0 y = 2
x−y .
z 1 1 0 z 2 0 −2 z 2x − 2z
B
134 CHAPTER 5. LINEAR TRANSFORMATIONS
−1/2 0 1 0 2 0
(c) [A]B =
1/2 0 0 , [B]A = 0 −2 1 and [A]B [B]A = I3 .
1 1 0 1 1 0
Exercise 5.4.11. In R3 , let A = (1, 2, 0)T , (1, 3, 2)T , (0, 1, 3)T be an ordered basis.
1. If B = (1, 2, 1)T , (0, 1, 2)T , (1, 4, 6)T is another ordered basis of R3 . Then, determine
h i
AF
= B[T (v1 )]B · · · B[T (vn )]B [x]A = B [T (v1 )]B · · · [T (vn )]B [x]A .
When there is no mention of bases, we take it to be the standard ordered bases and denote
the corresponding matrix by [T ]. Also, note that for each x ∈ V, the matrix T [A, B][x]A is
the coordinate vector of T (x) with respect to the ordered basis B of the co-domain. Thus,
the matrix T [A, B] takes coordinate vector of the domain points to the coordinate vector of its
images. The above discussion is stated as the next result.
𝒜 ℬ
𝕍 𝕎
Figure 5.1: Matrix of the Linear Transformation
See Figure 5.1 for clarity on which basis occurs at which place.
Remark 5.5.3. Let V and W be vector spaces over F with ordered bases A1 = (v1 , . . . , vn )
and B1 = (w1 , . . . , wm ), respectively. Also, for α ∈ F with α 6= 0, let A2 = (αv1 , . . . , αvn ) and
B2 = (αw1 , . . . , αwm ) be another set of ordered bases of V and W, respectively. Then, for any
T ∈ L(V, W)
h i h i
T [A2 , B2 ] = [T (αv1 )]B2 · · · [T (αvn )]B2 = [T (v1 )]B1 · · · [T (vn )]B1 = T [A1 , B1 ].
Thus, the same matrix can be the matrix representation of T for two different pairs of bases.
T
AF
We now give a few examples to understand the above discussion and Theorem 5.5.2.
DR
Q = (0, 1)
Q′ = (− sin θ, cos θ)
P ′ = (x′ , y ′ )
θ P ′ = (cos θ, sin θ)
θ P = (x, y)
P = (1, 0)
θ α
O O
1
Figure 5.2: Counter-clockwise Rotation by an angle θ
2 2
T [B, B] = T , T = , = 3 3
1 −1 −1 3 2 −2
DR
B B B B
" # " # " # " #
2 2 0 −1 0
as = B −1 and =B .
−1 −1 3 3
B B
5. Define T ∈ L(C3 ) by T (x) = x, for all x ∈ C3 . Note that T is the Id map. De-
termine the coordinate matrix with respect to the ordered basis A = e1 , e2 , e3 and
B = (1, 0, 0), (1, 1, 0), (1, 1, 1) .
Solution: By definition, verify that
1 0 0 1 −1 0
T [A, B] = [[T (e1 )]B , [T (e2 )]B , [T (e3 )]B ] =
0 , 1 , 0
= 0 1 −1
0 0 1 0 0 1
B B B
and
1 1 1 1 1 1
T [B, A] =
0 , 1 , 1
= 0 1 1 .
0 0 1 0 0 1
A A A
5.5. MATRIX OF A LINEAR TRANSFORMATION 137
Thus, verify that T [B, A]−1 = T [A, B] and T [A, A] = T [B, B] = I3 as the given map is
indeed the identity map.
We now give a remark which relates the above ideas with respect to matrix multiplication.
Remark 5.5.5. 1. Fix S ∈ Mn (C) and define T ∈ L(Cn ) by T (x) = Sx, for all x ∈ Cn . If
A is the standard basis of Cn then [T ] = S as
[T ][:, i] = [T (ei )]A = [S(ei )]A = [S[:, i]]A = S[:, i], for 1 ≤ i ≤ n.
2. Fix S ∈ Mm,n (C) and define T ∈ L(Cn , Cm ) by T (x) = Sx, for all x ∈ Cn . Let A and B
be the standard ordered bases of Cn and Cm , respectively. Then T [A, B] = S as
(T [A, B])[:, i] = [T (ei )]B = [Sei ]B = [S[:, i]]B = S[:, i], for 1 ≤ i ≤ n.
3. Fix S ∈ Mn (C) and define T ∈ L(Cn ) by T (x) = Sx, for all x ∈ Cn . Let A = (v1 , . . . , vn )
and B = (u1 , . . . , un ) be two ordered bases of Cn with respective basis matrices A and B.
Then
h i h i
T [A, B] = [T (v1 )]B · · · [T (v1 )]B = B −1 T (v1 ) · · · B −1 T (v1 )
h i h i
= B −1 Sv1 · · · B −1 Sv1 = B −1 S v1 · · · vn = B −1 SA.
T
2. [Finding T from T [A, B]] Let V and W be vector spaces over F with ordered bases A and
B, respectively. Suppose we are given the matrix S = T [A, B]. Then to determine the
corresponding T ∈ L(V, W), we go back to the symbolic expression in Equation (5.4.1)
and Theorem 5.5.2. We see that
(a) T (v) = B[T (v)]B = BT [A, B][v]A = BS[v]A .
(b) In particular, if V = W = Fn and A = B then T (v) = BSB −1 v.
(c) Further, if B is the standard ordered basis then T (v) = Sv.
Exercise 5.5.7. 1. Relate Remark 5.5.5.3 with Theorem 5.4.9 as Id is the identity map.
3. Let T ∈ L(R2 ) represent the reflection about the line y = mx. Find [T ].
4. Let T ∈ L(R3 ) represent the reflection about/across the X-axis. Find [T ]. What about the
reflection across the XY -plane?
5. Let T ∈ L(R3 ) represent the counter-clockwise rotation around the positive Z-axis by an
angle θ, 0 ≤ θ < 2π. Findits matrix with respect to the standard ordered basis of R3 .
cos θ − sin θ 0
[Hint: Is sin θ cos θ 0
the required matrix?]
0 0 1
6. Define a function D ∈ L(R[x; n]) by D(f (x)) = f 0 (x). Find the matrix of D with respect
to the standard ordered basis of R[x; n]. Observe that Rng(D) ⊆ R[x; n − 1].
This idea can be generalized to any finite dimensional vector space. To do so, we start with
the matrix of the composition of two linear transformations. This also helps us to relate matrix
multiplication with composition of two functions.
So, for all u ∈ V, we get (S[C, D] · T [B, C]) [u]B = [(ST )(u)]D = (ST )[B, D] [u]B . Hence
(ST ) [B, D] = S[C, D] · T [B, C].
As an immediate corollary of Theorem 5.6.1 we see that the matrix of the inverse linear
transform is the inverse of the matrix of the linear transform, whenever the inverse exists.
Theorem 5.6.2 (Inverse of a Linear Transformation). Let V is a vector space with dim(V) = n.
If T ∈ L(V) is invertible then for any ordered basis B and C of the domain and co-domain,
respectively, one has (T [C, B])−1 = T −1 [B, C]. That is, the inverse of the coordinate matrix of
T is the coordinate matrix of the inverse linear transform.
Proof. As T is invertible, T T −1 = Id. Thus, Remark 5.5.5.3 and Theorem 5.6.1 imply
Hence, by definition of inverse, T −1 [B, C] = (T [C, B])−1 and the required result follows.
Exercise 5.6.3. Find the matrix of the linear transformations given below.
T (x) = 1 + x, T (x2 ) = (1 + x)2 and T (x3 ) = (1 + x)3 . Prove that T is invertible. Also,
AF
Let V be a finite dimensional vector space. Then, the next result answers the question “what
happens to the matrix T [B, B] if the ordered basis B changes to C?”
T [B, B]
(V, B) (V, B)
Id ◦ T
Id[B, C] Id[B, C]
T ◦ Id
(V, C) (V, C)
T [C, C]
Figure 1: Commutative Diagram for Similarity of Matrices
Figure 5.4: T [C, C] = Id[B, C] · T [B, B] · (Id[B, C])−1 - Similarity of Matrices
Theorem 5.6.4. Let B = (u1 , . . . , un ) and C = (v1 , . . . , vn ) be two ordered bases of V and Id
the identity operator. Then, for any linear operator T ∈ L(V)
Proof. As Id is the identity operator, the composite functions (T ◦ Id), (Id ◦ T ) from (V, B) to
(V, C) are equal (see Figure 5.4 for clarity). Hence, their matrix representations with respect to
140 CHAPTER 5. LINEAR TRANSFORMATIONS
ordered bases B and C are equal. Thus, (T ◦ Id)[B, C] = T [B, C] = (Id ◦ T )[B, C]. Thus, using
Theorem 5.6.1, we get
Definition 5.6.5. Let V be a vector space with ordered bases B and C. If T ∈ L(V) then,
T [C, C] = Id[B, C] · T [B, B] · Id[C, B]. The matrix Id[B, C] is called the change of basis matrix
(also, see Theorem 5.4.9) from B to C.
Definition 5.6.6. Let X, Y ∈ Mn (C). Then, X and Y are said to be similar if there exists a
non-singular matrix P such that P −1 XP = Y ⇔ X = P Y P −1 ⇔ XP = P Y .
bases of R[x; 2]. Then, verify that Id[B, C]−1 = Id[C, B], as
−1 1 −2
T
1 0 1
DR
0 −1 1
Id[B, C] = [[1 + x]C , [1 + 2x + x2 ]C , [2 + x]C ] =
1 1 1 .
0 1 0
Exercise 5.6.8. 1. Let A ∈ Mn (R) such that tr(A) = 0. Then prove that there exists a
non-singular matrix S such that SAS −1 = B with B = [bij ] and bii = 0, for 1 ≤ i ≤ n.
2. Let V be a vector space with dim(V) = n. Let T ∈ L(V) satisfy T n−1 6= 0 but T n = 0.
Then, use Exercise 5.1.13.2 to get an ordered basis B = u, T (u), . . . , T n−1 (u) of V.
0 0 0 ··· 0
1 0 0 · · · 0
0 1 0 · · · 0
(a) Now, prove that T [B, B] = .
. . . . . ..
. . . .
.
0 0 ··· 1 0
(b) Let A ∈ Mn (C) satisfy An−1 6= 0 but An = 0. Then, prove that A is similar to the
matrix given in Part 2a.
3. Let A be an ordered basis of a vector space V over F with dim(V) = n. Then prove that
the set of all possible matrix representations of T is given by (also see Definition 5.6.5)
4. Let B1 (α, β) = {(x, y)T ∈ R2 : (x − α)2 + (y − β)2 ≤ 1}. Then, can we get a linear
transformation T ∈ L(R2 ) such that T (S) = W , where S and W are given below?
5. Let V, W be vector spaces over F with dim(V) = n and dim(W) = m and ordered bases
B and C, respectively. Define IB,C : L(V, W) → Mm,n (F) by IB,C (T ) = T [B, C]. Show
that IB,C is an isomorphism. Thus, when bases are fixed, the number of m × n matrices
is same as the number of linear transformations.
of R3 . Then find
(b) the matrix P such that P −1 T [B, B] P = T [C, C]. Note that P = Id[C, B].
AF
DR
As an application of the ideas and results related with orthogonality, we would like to go back
to the system of linear equations. So, recall that we started with the solution set of the linear
system Ax = b, for A ∈ Mm,n (C), x ∈ Cn and b ∈ Cm . We saw that if b ∈ Col(A) then the
system Ax = b is consistent and one can use the Gauss-Jordan method to get the solution set
of Ax = b. If the system is inconsistent can we talk of the ‘best possible solution’ ? How do we
define ‘Best’ ?
In most practical applications, the linear systems are inconsistent due to various reasons.
The reasons could be related with human error, or computational/rounding-off error or missing
data or there is not enough time to solve the whole linear system. So, we need to go bound
consistent linear systems. In quite a few such cases, we are interested in finding a point x ∈ Rn
such that the error vector, defined as kb − Axk has the least norm. Thus, we consider the
problem of finding x0 ∈ Rn such that
Definition 5.7.2. Let W be a finite dimensional subspace of an ips V. Then, by Theorem 5.7.1,
for each v ∈ V there exist unique vectors w ∈ W and u ∈ W⊥ with v = w + u. We thus define
the orthogonal projection of V onto W, denoted PW , by
PW : V → W by PW (v) = w.
So, note that the solution x0 ∈ Rn satisfying kb − Ax0 k = min{kb − Axk : x ∈ Rn } is the
projection of b on the Col(A).
Remark 5.7.3. Let A ∈ Mm,n (R) and W = Col(A). Then, to find the orthogonal projection
PW (b), we can use either of the following ideas:
k
P
1. Determine an orthonormal basis {f1 , . . . , fk } of Col(A). Then PW (b) = hb, fi ifi . Note
T
i=1
that
AF
k k k k
!
DR
X X X X
fi (fiT b) = fi fiT b = fi fiT
x0 = PW (b) = hb, fi ifi = b = P b,
i=1 i=1 i=1 i=1
k
fi fiT is called the projection matrix of Rm onto Col(A).
P
where P =
i=1
2. By Theorem 3.6.5.2, Col(A) = Null(AT )⊥ . Hence, for b ∈ Rm there exists unique
u ∈ Col(A) and v ∈ Null(AT ) such that b = u + v. Thus, using Definition 5.7.2 and
Theorem 5.7.1, PW (b) = u.
Corollary 5.7.4. Let A ∈ Mm,n (R) and b ∈ Rm . Then, x0 is a least square solution of Ax = b
if and only if x0 is a solution of the system AT Ax = AT b.
Proof. As b ∈ Rm , by Remark 5.7.3, there exists y ∈ Col(A) and v ∈ Null(AT ) such that
b = y + v and min{kb − wk | w ∈ Col(A)} = kb − yk. As y ∈ Col(A), there exists x0 ∈ Rn
such that Ax0 = y, i.e.,
Thus, the vectors b − Ax1 and Ax1 − Ax are orthogonal and hence
1 4/3
AF
1 1 1
DR
2. Find the foot of the perpendicular from the point v = (1, 2, 3, 4)T on the plane generated
by the vectors (1, 1, 0, 0)T , (1, 0, 1, 0)T and (0, 1, 1, 1)T .
144 CHAPTER 5. LINEAR TRANSFORMATIONS
(a) Method 1: Note that the three vectors lie on the plane x − y − z − 2w = 0. Then
r = (1, −1, −1, 2)T is the normal vector of the plane. Hence
4 1
v − Projr v = (1, 2, 3, 4)T − (1, −1, −1, 2)T = (3, 18, 25, 20)T
7 7
is the required projection of v.
(b) Method 2: Using the Gram-Schmidt process, we get
1 1 1
w1 = √ (1, 1, 0, 0)T , w2 = √ (1, −1, 2, 0)T , w3 = √ (−2, 2, 2, 3)T
2 6 21
as an orthonormal basis of the plane generated by the vectors (1, 1, 0, 0)T , (1, 0, 1, 0)T
and (0, 1, 1, 1)T . Thus, the projection matrix equals
6/7 1/7 1/7 −2/7 3
3
6/7 −1/7 2/7
1/7 118.
X
T
P = wi wi = and P v =
1/7 −1/7 6/7 2/7 7 25
i=1
−2/7 2/7 2/7 3/7 20
1 1 0
1 0 1
(c) Method 3: Let A = . Then we need x0 satisfying (AT A)x = AT b. Here
0 1 1
0 0 1
T
AF
2 1 1 3 5 −2 −1
−1 1
AT A = T T
1 2 1 and A b = 4. Note that (A A) = 7 −2 5 −1 and
DR
1 1 3 9 −1 −1 3
T T
hence the solution of the system (A A)x = A b equals
5 −2 −1 2 1 1 −2
T −1 T 1 1
x = (A A) (A b) = −2 5 −1 1 2 1 = 7 5 .
7
−1 −1 3 1 1 3 20
1 1 0 3
−2
1 0 1 1 118 is the nearest vector to v = (1, 2, 3, 4)T .
Thus, Ax = · 5 =
0 1 1 7 7 25
20
0 0 1 20
Exercise 5.7.6. 1. Let W = {(x, y, z, w) ∈ R4 : x = y, z = w} be a subspace of R4 .
Determine the matrix of the orthogonal projection.
2. Let PW1 and PW2 be the orthogonal projections of R2 onto W1 = {(x, 0) : x ∈ R} and
W2 = {(x, x) : x ∈ R}, respectively. Note that PW1 ◦ PW2 is a projection onto W1 . But,
it is not an orthogonal projection. Hence or otherwise, conclude that the composition of
two orthogonal projections need not be an orthogonal projection?
" #
1 1
3. Let A = . Then, A is idempotent but not symmetric. Now, define P : R2 → R2 by
0 0
P (v) = Av, for all v ∈ R2 . Then,
5.8. ORTHOGONAL OPERATOR AND RIGID MOTION∗ 145
(a) P is idempotent.
(b) Null(P ) ∩ Rng(P ) = Null(A) ∩ Col(A) = {0}.
(c) R2 = Null(P ) + Rng(P ). But, (Rng(P ))⊥ = (Col(A))⊥ 6= Null(A).
(d) Since (Col(A))⊥ 6= Null(A), the map P is not an orthogonal projector. In this
case, P is called a projection of R2 onto Rng(P ) along Null(P ).
4. Find all 2 × 2 real matrices A such that A2 = A. Hence, or otherwise, determine all
projection operators of R2 .
5. Let W be an (n − 1)-dimensional subspace of Rn with ordered basis BW = [f1 , . . . , fn−1 ].
Suppose B = [f1 , . . . , fn−1 , fn ] is an orthogonal ordered basis of Rn obtained by extending
n−1
BW . Now, define a function Q : Rn → Rn by Q(v) = hv, fn ifn −
P
hv, fi ifi . Then,
i=1
i=1 i=k+1
prove that
DR
(a) In − PW = PW⊥ .
(b) (PW )T = PW and (PW⊥ )T = PW⊥ . That is, PW and PW⊥ are symmetric.
(c) (PW )2 = PW and (PW⊥ )2 = PW⊥ . That is, PW and PW⊥ are idempotent.
(d) PW ◦ PW⊥ = PW⊥ ◦ PW = 0.
Example 5.8.2. Prove that the following maps T are orthogonal operators.
1. Fix a unit vector a ∈ Rn and define T : Rn → Rn by T (x) = 2hx, aia − x, for all x ∈ Rn .
Solution: Note that Proja (x) = hx, aia. So, hx, aia, x − hx, aia = 0 and
kT (x)k2 = k(hx, aia) + (hx, aia − x)k2 = khx, aiak2 + kx − hx, aiak2 = kxk2 .
146 CHAPTER 5. LINEAR TRANSFORMATIONS
" #" #
cos θ − sin θ x
2. Fix θ, 0 ≤ θ < 2π and define T : R2 → R2
by T (x) = , for all x ∈ R2 .
sin θ cos θ y
" # " #
x cos θ − y sin θ p x
Solution: Note that kT (x)k = k k = x2 + y 2 = k k.
x sin θ + y cos θ y
We now show that an operator is orthogonal if and only if it preserves the angle.
Theorem 5.8.3. Let T ∈ L(Rn ). Then, the following statements are equivalent.
1. T is an orthogonal operator.
2. hT (x), T (y)i = hx, yi, for all x, y ∈ Rn , i.e., T preserves inner product.
Corollary 5.8.4. Let T ∈ L(Rn ). Then, T is an orthogonal operator if and only if “for every
orthonormal basis {u1 , . . . , un } of Rn , {T (u1 ), . . . , T (un )} is an orthonormal basis of Rn ”.
T
Observe that if T and S are two rigid motions then ST is also a rigid motion. Furthermore,
it is clear from the definition that every rigid motion is invertible.
We now prove that every rigid motion that fixes origin is an orthogonal operator.
Theorem 5.8.7. The following statements are equivalent for any map T : Rn → Rn .
2. T is linear and hT (x), T (y)i = hx, yi, for all x, y ∈ Rn (preserves inner product).
3. T is an orthogonal operator.
5.8. ORTHOGONAL OPERATOR AND RIGID MOTION∗ 147
Proof. We have already seen the equivalence of Part 2 and Part 3 in Theorem 5.8.3. Let us now
prove the equivalence of Part 1 and Part 2/Part 3.
If T is an orthogonal operator then T (0) = 0 and kT (x) − T (y)k = kT (x − y)k = kx − yk.
This proves Part 3 implies Part 1.
We now prove Part 1 implies Part 2. So, let T be a rigid motion that fixes 0. Thus,
T (0) = 0 and kT (x) − T (y)k = kx − yk, for all x, y ∈ Rn . Hence, in particular for y = 0, we
have kT (x)k = kxk, for all x ∈ Rn . So,
kT (x)k2 + kT (y)k2 − 2hT (x), T (y)i = kT (x) − T (y)k2 = kx − yk2 = kxk2 + kyk2 − 2hx, yi.
Thus, using kT (x)k = kxk, for all x ∈ Rn , we get hT (x), T (y)i = hx, yi, for all x, y ∈ Rn . Now,
to prove T is linear, we use hT (x), T (y)i = hx, yi, for all x, y ∈ Rn , in 3-rd and 4-th line below
to get
Thus, T (x+y)−(T (x) + T (y)) = 0 and hence T (x+y) = T (x)+T (y). A similar calculation
AF
Prove that Orthogonal and Unitary congruences are equivalence relations on Mn (R) and
Mn (C), respectively.
2. Let x ∈ C2 . Identify it with the complex number x = x1 + ix2 . If we rotate x by a
counterclockwise rotation θ, 0 ≤ θ < 2π then, we have
xeiθ = (x1 + ix2 ) (cos θ + i sin θ) = x1 cos θ − x2 sin θ + i[x1 sin θ + x2 cos θ].
5. Let U ∈ Mn (C). Then, prove that the following statements are equivalent.
angle.
(g) for any vector x ∈ Cn , kU xk = kxk Unitary matrices preserve length.
DR
|aij |2 = |bij |2 .
P P
6. If A = [aij ] and B = [bij ] are unitarily equivalent then prove that
ij ij
7. Let U be a unitary matrix and for every x ∈ Cn , define
Definition 5.9.3. Let V be a vector space over F. Then L(V, F) is called the dual space of
V and is denoted by V∗ . The double dual space of V, denoted V∗∗ , is the dual space of V∗ .
Corollary 5.9.4. Let V and W be vector spaces over F with dim V = n and dim W = m.
1. Then L(V, W) ∼
= Fmn . Moreover, {fij |1 ≤ i ≤ n, 1 ≤ j ≤ m} is a basis of L(V, W).
2. In particular, if W = F then L(V, F) = V∗ ∼ = Fn . Moreover, if({v1 , . . . , vn } is a basis of
1, if k = i
V then the set {fi |1 ≤ i ≤ n} is a basis of V∗ , where fi (vk ) = The basis
0, k 6= i.
{fi |1 ≤ i ≤ n} is called the dual basis of Fn .
Exercise 5.9.5. Let V be a vector space. Suppose there exists v ∈ V such that f (v) = 0, for
all f ∈ V∗ . Then prove that v = 0.
So, we see that V∗ can be understood through a basis of V. Thus, one can understand V∗∗
T
again via a basis of V∗ . But, the question arises “can we understand it directly via the vector
AF
space V itself?” We answer this in affirmative by giving a canonical isomorphism from V to V∗∗ .
DR
So, for each v ∈ V, we have obtained a linear functional Lv ∈ V∗∗ . Note that, if v 6= w then,
Lv 6= Lw . Indeed, if Lv = Lw then, Lv (f ) = Lw (f ), for all f ∈ V∗ . Thus, f (v) = f (w), for all
f ∈ V∗ . That is, f (v − w) = 0, for each f ∈ V∗ . Hence, using Exercise 5.9.5, we get v − w = 0,
or equivalently, v = w.
We use the above argument to give the required canonical isomorphism.
Theorem 5.9.6. Let V be a vector space over F. If dim(V) = n then the canonical map
T : V → V∗∗ defined by T (v) = Lv is an isomorphism.
Thus, Lαv+u = αLv +Lu . Hence, T (αv+u) = αT (v)+T (u). Thus, T is a linear transformation.
For verifying T is one-one, assume that T (v) = T (u), for some u, v ∈ V. Then, Lv = Lu . Now,
use the argument just before this theorem to get v = u. Therefore, T is one-one.
Thus, T gives an inclusion (one-one) map from V to V∗∗ . Further, applying Corollary 5.9.4.2
to V∗ , gives dim(V∗∗ ) = dim(V∗ ) = n. Hence, the required result follows.
We now give a few immediate consequences of Theorem 5.9.6.
150 CHAPTER 5. LINEAR TRANSFORMATIONS
Proof. Part 1 is direct as T : V → V∗∗ was a canonical inclusion map. For Part 2, we need to
show that
( (
1, if j = i 1, if j = i
Lvi (fj ) = or equivalently fj (vi ) =
0, if j 6= i 0, if j 6= i
which indeed holds true using Corollary 5.9.4.2.
Let V be a finite dimensional vector space. Then Corollary 5.9.7 implies that the spaces V
and V∗ are naturally dual to each other.
We are now ready to prove the main result of this subsection. To start with, let V and W
be vector spaces over F. Then, for each T ∈ L(V, W), we want to define a map Tb : W∗ → V∗ .
So, if g ∈ W∗ then, Tb(g) a linear functional
from V to F. So, we need to be evaluate T (g) at
b
an element of V. Thus, we define Tb(g) (v) = g (T (v)), for all v ∈ V. Now, we note that
Tb ∈ L(W∗ , V∗ ), as for every g, h ∈ W∗ ,
T
AF
Tb(αg + h) (v) = (αg + h) (T (v)) = αg (T (v)) + h (T (v)) = αTb(g) + Tb(h) (v),
DR
Theorem 5.9.8. Let V and W be vector spaces over F with ordered bases A = (v1 , . . . , vn )
and B = (w1 , . . . , wm ), respectively. Also, let A∗ = (f1 , . . . , fn ) and B ∗ = (g1 , . . . , gm ) be the
corresponding ordered bases of the dual spaces V∗ and W∗ , respectively. Then,
n n
P P
Now, recall that the functionals fi ’s and gj ’s satisfy αk fk (vt ) = αk (fk (vt )) = αt ,
k=1 k=1
for 1 ≤ t ≤ n and [gj (w1 ), . . . , gj (wm )] = eTj , a row vector with 1 at the j-th place and 0,
elsewhere. So, let B = [w1 , . . . , wm ] and evaluate Tb(gj ) at vt ’s, the elements of A.
Tb(gj ) (vt ) = gj (T (vt )) = gj (B [T (vt )]B ) = [gj (w1 ), . . . , gj (wm )] [T (vt )]B
a1t
n
!
a
T 2t
X
= ej . = ajt = ajk fk (vt ).
..
k=1
amt
n
P
Thus, the linear functional Tb(gj ) and ajk fk are equal at vt , for 1 ≤ t ≤ n, the basis vectors
k=1
n
of V. Hence Tb(gj ) =
P
ajk fk which gives Equation (5.9.1).
k=1
Remark 5.9.9. The proof of Theorem 5.9.8 also shows the following.
1. For each T ∈ L(V, W) there exists a unique map Tb ∈ L(W∗ , V∗ ) such that
Tb(g) (v) = g (T (v)) , for each g ∈ W∗ .
2. The coordinate matrices T [A, B] and Tb[B ∗ , A∗ ] are transpose of each other, where the
T
V and B of W.
DR
3. Thus, the results on matrices and its transpose can be re-written in the language of a
vector space and its dual space.
5.10 Summary
152 CHAPTER 5. LINEAR TRANSFORMATIONS
T
AF
DR
Chapter 6
Note that we have been trying to solve the linear system Ax = b. But, in most cases, we are
not able to solve it because of certain restrictions. Hence in the last chapter, we looked at the
nearest solution or obtained the projection of b on the column space of A.
T
AF
These problems arise from the fact that either our data size is too large or there are missing
informations (data is incomplete or the data has ambiguities or the data is inaccurate) or the
DR
data is coming too fast in the sense that our computational power doesn’t match the speed at
which data is received or it could be any other reason. So, to take care of such issues, we either
work with a submatrix of A or with the matrix AT A. We also try to concentrate on only a few
important aspects depending on our past experience.
Thus, we need to find certain set of critical vectors/directions associated with the given
linear system. Hence, in this chapter, all our matrices will be square matrices. They will have
real numbers as entries for convenience. But, we need to work over complex numbers. Hence,
we will be working with Mn (C) and x = (x1 , . . . , xn )T ∈ Cn , for some n ∈ N. Further, Cn will
be considered only as a complex vector space. We start with an example for motivation.
Example 6.1.1. Let A be a real symmetric matrix. Consider the following problem:
n X
n n
!
X X
T T
L(x, λ) = x Ax − λ(x x − 1) = aij xi xj − λ x2i −1 .
i=1 j=1 i=1
153
154 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY
∂L
= 2a11 x1 + 2a12 x2 + · · · + 2a1n xn − 2λx1 ,
∂x1
.. ..
.=.
∂L
= 2an1 x1 + 2an2 x2 + · · · + 2ann xn − 2λxn .
∂xn
Therefore, to get the points of extremum, we solve for
T
T ∂L ∂L ∂L ∂L
0 = , ,..., = = 2(Ax − λx).
∂x1 ∂x2 ∂xn ∂x
Thus, to solve the extremal problem, we need λ ∈ R, x ∈ Rn such that x 6= 0 and Ax = λx.
Note that we could have started with a Hermitian matrix and arrived at a similar situation.
So, in previous chapters, we had looked at Ax = b, where A and b were known. Here, we need
to solve Ax = λx with x 6= 0. Note that 0 is already a solution and is not of interest to us.
Further, we will see that we are interested in only those solutions of Ax = λx which are linearly
independent. To proceed further, let us take a few examples, where we will try to look at what
does the system Ax = b imply?
" # " # " #
1 2 9 −2 x
Example 6.1.2. 1. Let A = ,B= and x = .
2 1 −2 6 y
T
AF
1 1 1
" # " # " # " #
1 1 1 1
by changing the direction of as A = −1 . Further, the vectors
−1 −1 −1 1
" #
1
and are orthogonal.
−1
" # " # " # " # " # " #
1 −2 1 1 2 2
(b) B magnifies both the vectors and as B =5 and B = 10 .
2 1 2 2 −1 −1
" # " #
1 2
Here again, the vectors and are orthogonal.
2 −1
(x + y)2 (x − y)2
(c) xT Ax = 3 − . Here, the displacements occur along perpendicular
2 2 " # " #
1 1
lines x + y = 0 and x − y = 0, where x + y = (x, y) and x − y = (x, y) .
1 −1
(x + 2y)2 (2x − y)2
Whereas xT Bx = 5 + 10 . Here also the maximum/minimum
5 5
displacements occur
" # along the orthogonal
" # lines x + 2y = 0 and 2x − y = 0, where
1 2
x + 2y = (x, y) and 2x − y = (x, y) .
2 −1
(d) the curve xT Ax = 10 represents a hyperbola, where as the curve xT Bx = 10 rep-
resents an ellipse (see the left two curves in Figure 6.1 drawn using the package
“Sagemath”).
6.1. INTRODUCTION AND DEFINITIONS 155
Figure 6.1: A Hyperbola and two Ellipses (first one has orthogonal axes)
.
In the above two examples we looked at symmetric matrices. What if our matrix is not
symmetric?
" #
7 −2
2. Let C = , a non-symmetric matrix. Then, does there exist a non-zero x ∈ C2
2 2
which gets magnified by C?
We need x 6= 0 and α ∈ C such that Cx = αx ⇔ [αI2 −C]x = 0. As x 6= 0, [αI2 −C]x = 0
has a solution if and only if det[αI − A] = 0. But,
T
AF
" #!
α−7 2
det[αI − A] = det = α2 − 9α + 18.
DR
−2 α − 2
" # " #
2 1
So α = 6, 3. For α = 6, verify that x = 6= 0 satisfies Cx = 6x. Similarly, x =
1 2
satisfies Cx = 3x. In this example,
" # " #
2 1
(a) we still have magnifications in the directions and .
1 2
(b) the maximum/minimum displacements do not occur along the lines 2x + y = 0 and
x + 2y = 0 (see the third curve in Figure 6.1). Note that
" #
7 0
{x ∈ R2 : xT Ax = 10} = {x ∈ R2 : xT x = 10},
0 2
" #
7 0
where is a symmetrization of A.
0 2
(c) the lines 2x + y = 0 and x + 2y = 0 are not orthogonal.
We observe the following about the matrices A, B and C that appear above:
1. det(A) = −3 = 3 × −1, det(B) = 50 = 5 × 10 and det(C) = 18 = 6 × 3.
2. tr(A) = 2 = 3 − 1, tr(B) = 15 = 5 + 10 and det(C) = 9 = 6 + 3.
(" # " #) (" # " #) (" # " #)
1 1 1 2 2 1
3. The sets , , , and , are linearly independent.
1 −1 2 −1 1 2
156 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY
" # " #
1 1
4. If v1 = and v2 = and S = [v1 , v2 ] then
1 −1
" # " #
3 0 3 0
(a) AS = [Av1 , Av2 ] = [3v1 , −v2 ] = S ⇔ S −1 AS = = diag(3, −1).
0 −1 0 −1
1 1
(b) Let u1 = √ v1 and u2 = √ v2 . Then, u1 and u2 are orthonormal unit vectors,
2 2
i.e., if U = [u1 , u2 ] then I = U U ∗ = u1 u∗1 + u2 u∗2 and A = 3u1 u∗1 − u2 u∗2 .
" # " #
1 2
5. If v1 = and v2 = and S = [v1 , v2 ] then
2 −1
" # " #
5 0 5 0
(a) AS = [Av1 , Av2 ] = [5v1 , 10v2 ] = S ⇔ S −1 AS = = diag(3, −1).
0 10 0 10
1 1
(b) Let u1 = √ v1 and u2 = √ v2 . Then, u1 and u2 are orthonormal unit vectors,
5 5
i.e., if U = [u1 , u2 ] then I = U U ∗ = u1 u∗1 + u2 u∗2 and A = 5u1 u∗1 + 10u2 u∗2 .
" # " # " #
2 1 6 0
6. If v1 = and v2 = and S = [v1 , v2 ] then S −1 CS = = diag(6, 3).
1 2 0 3
representation is a diagonal matrix. To understand the ideas better, we start with the following
AF
definitions.
DR
Ax = λx ⇔ (A − λIn )x = 0 (6.1.1)
Proof. Let B = A − αIn . Then, by definition, α is an eigenvalue of A if any only if the system
Bx = 0 has a non-trivial solution. By Theorem 2.6.3 this holds if and only if det(B) = 0.
Definition 6.1.5. Let A ∈ Mn (C). Then det(A − λI) is a polynomial of degree n in λ and is
called the characteristic polynomial of A, denoted PA (λ), or in short P (λ).
6.1. INTRODUCTION AND DEFINITIONS 157
Remark 6.1.6. Let A ∈ Mn (C). Then A is singular if and only if 0 ∈ σ(A). Further, the
following statements hold.
1. If α ∈ σ(A) then
(a) {0} $ Null(A − αI). Therefore, if Rank(A − αI) = r then r < n. Hence, by
Theorem 2.6.3, the system (A − αI)x = 0 has n − r linearly independent solutions.
(b) v ∈ Null(A−αI) if and only if cv ∈ Null(A−αI), for c 6= 0. Thus, an eigenvector
v of A is in some sense a line ` = Span({v}) that passes through 0 and v and has
the property that the image of ` is either ` itself or 0.
r
ci xi ∈ Null(A − αI), for all ci ∈ C. Hence, if
P
(c) If x1 , . . . , xr ∈ Null(A − αI) then
i=1
S is a collection of eigenvectors then, we necessarily want the set S to be linearly
independent.
T
2. α ∈ σ(A) if and only if α is a root of PA (x) ∈ C[x]. As deg(PA (x)) = n, A has exactly n
AF
Almost all books in mathematics differentiate between characteristic value and eigenvalue
as the ideas change when one moves from complex numbers to any other scalar field. We give
the following example for clarity.
Remark 6.1.7. Let A ∈ M2 (F). Then, A induces a map T ∈ L(F2 ) defined by T (x) = Ax, for
all x ∈ F2 . We use this idea to understand the difference.
" #
0 1
1. Let A = . Then, pA (λ) = λ2 + 1.
−1 0
Let us look at some more examples. Also, as stated earlier, we look at roots of the charac-
teristic equation over C.
1 −1 i 1
5. Let A = . Then, 1 + i, and 1 − i, are the eigen-pairs of A.
1 1 1 i
DR
0 1 0
6. Let A = 0 0 1 . Then, σ(A) = {0, 0, 0} with e1 as the only eigenvector.
0 0 0
0 1 0 0 0 x1
0 0 1 0 0 x2
7. Let A = 0 0 0 0 0. Then, σ(A) = {0, 0, 0, 0, 0}. Note that Ax3 = 0 implies
0 0 0 0 1 x
4
0 0 0 0 0 x5
x2 = 0 = x3 = x5 . Thus, e1 and e4 are the only eigenvectors. Note that the diagonal
blocks of A are nilpotent matrices.
Exercise 6.1.9. 1. Prove that the matrices A and AT have the same set of eigenvalues.
Construct a 2 × 2 matrix A such that the eigenvectors of A and AT are different.
4. Let A be a nilpotent matrix. Then, prove that its eigenvalues are all 0.
5. Let J = 11T ∈ Mn (C). Then, J is a matrix with each entry 1. Show that
6.1. INTRODUCTION AND DEFINITIONS 159
6. Let A = [aij ] ∈ Mn (R), where aij = a, if i = j and b, otherwise. Then, verify that
A = (a − b)I + bJ. Hence, or otherwise determine the eigenvalues and eigenvectors of J.
9. Let A ∈ Mn (C) satisfy kAxk ≤ kxk for all x ∈ Cn . Then prove that every eigenvalue of
A lies between −1 and 1.
n
10. Let A = [aij ] ∈ Mn (C) with
P
aij = a, for all 1 ≤ i ≤ n. Then, prove that a is an
j=1
eigenvalue of A with corresponding eigenvector 1 = [1, 1, . . . , 1]T .
" #
B 0
11. Let B ∈ Mn (C) and C ∈ Mm (C). Let Z = . Then
0 C
" #!
x
(a) (α, x) is an eigen-pair for B implies α, is an eigen-pair for Z.
0
T
AF
" #!
0
(b) (β, y) is an eigen-pair for C implies β, is an eigen-pair for Z.
DR
Definition 6.1.10. Let A ∈ L(Cn ). Then, a vector y ∈ Cn \ {0} satisfying y∗ A = λy∗ is called
a left eigenvector of A for λ.
" # " # " # " # " #
7 −2 2 1 2 1
Example 6.1.11. Let A = , x= , y = , u= and v = . Then
2 2 1 2 −1 −2
verify that (6, x) and (3, y) are (right) eigen-pairs of A and (6, u), (3, v) are left eigen-pairs of
A. Note that xT v = 0 and yT u = 0. This is true in general and is proved next.
Theorem 6.1.12. [Principle of bi-orthogonality] Let (λ, x) be a (right) eigen-pair and (µ, y)
be a left eigen-pair of A. If λ 6= µ then y is orthogonal to x.
Proof. Verify that µy∗ x = (y∗ A)x = y∗ (Ax) = y∗ (λx) = λy∗ x. Thus y∗ x = 0.
Exercise 6.1.13. 1. Let Ax = λx and x∗ A = µx∗ . Then µ = λ.
2. Let S be a non-singular matrix such that its columns are left eigenvectors of A. Then,
prove that the columns of (S ∗ )−1 are right eigenvectors of A.
Proposition 6.1.15. Let T ∈ L(Cn ) and let B be an ordered basis in Cn . Then (α, v) is an
eigen-pair of T if and only if (α, [v]B ) is an eigen-pair of A = T [B, B].
160 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY
Proof. By definition, T (v) = αv if and only if [T v]B = [αv]B . Or equivalently, α ∈ σ(T ) if and
only if A[v]B = α[v]B . Thus, the required result follows.
Thus, the spectrum of a linear operator is independent of the choice of basis.
Remark 6.1.16. We give two examples to show that a linear operator on an infinite
dimensional vector space need not have an eigenvalue.
1. Let V be the space of all real sequences (see Example 3.1.4.7) and define T ∈ L(V) by
We now prove the observations that det(A) is the product of eigenvalues and tr(A) is the
sum of eigenvalues.
T
AF
Theorem 6.1.17. Let λ1 , . . . , λn , not necessarily distinct, be the A = [aij ] ∈ Mn (C). Then,
n
Q Pn n
P
det(A) = λi and tr(A) = aii = λi .
DR
for some a0 , a1 , . . . , an−1 ∈ C. Then, an−1 , the coefficient of (−1)n−1 xn−1 , comes from the term
3. Let A be a 3 × 3 orthogonal matrix (AAT = I). If det(A) = 1, then prove that there exists
v ∈ R3 \ {0} such that Av = v.
We now show that for any eigenvalue α, the algebraic and geometric multiplicities do not
change under similarity transformation, or equivalently, under change of basis.
2. for each α ∈ σ(A), Alg.Mulα (A) = Alg.Mulα (B) and Geo.Mulα (A) = Geo.Mulα (B).
Proof. Since A and B are similar, there exists an invertible matrix S such that A = SBS −1 .
So, α ∈ σ(A) if and only if α ∈ σ(B) as
Note that Equation (6.2.5) also implies that Alg.Mulα (A) = Alg.Mulα (B). We will now
show that Geo.Mulα (A) = Geo.Mulα (B).
So, let Q1 = {v1 , . . . , vk } be a basis of Null(A − αI). Then, B = SAS −1 implies that
Q2 = {Sv1 , . . . , Svk } ⊆ Null(B − αI). Since Q1 is linearly independent and S is invertible, we
get Q2 is linearly independent. So, Geo.Mulα (A) ≤ Geo.Mulα (B). Now, we can start with
eigenvectors of B and use similar arguments to get Geo.Mulα (B) ≤ Geo.Mulα (A). Hence
the required result follows.
Remark 6.2.4. 1. Let A = S −1 BS. Then, from the proof of Theorem 6.2.3, we see that x
is an eigenvector of A for λ if and only if Sx is an eigenvector of B for λ.
0 0 0 0
3. Let A ∈ Mn (C). Then, for any invertible matrix B, the matrices AB and BA =
DR
B(AB)B −1 are similar. Hence, in this case the matrices AB and BA have
(a) the same set of eigenvalues.
(b) Alg.Mulα (AB) = Alg.Mulα (BA), for each α ∈ σ(A).
(c) Geo.Mulα (AB) = Geo.Mulα (BA), for each α ∈ σ(A).
We will now give a relation between the geometric multiplicity and the algebraic multiplicity.
Theorem 6.2.5. Let A ∈ Mn (C). Then, for α ∈ σ(A), Geo.Mulα (A) ≤ Alg.Mulα (A).
Proof. Let Geo.Mulα (A) = k. So, suppose that {v1 , . . . , vk } is an orthonormal basis of
Null(A − αI). Extend it to get {v1 , . . . , vk , vk+1 , . . . , vn } as an orthonormal basis of Cn . Put
P = [v1 , . . . , vk , vk+1 , . . . , vn ]. Then P ∗ = P −1 and
2 3 1
h i
(x1 , e1 , e2 ) be an ordered basis of C3 . Put X = x1 e1 e2 . Compute X −1 AX. Can
you now find the remaining eigenvalues of A?
(b) If 0 ∈ σ(AB) and n = m then Alg.Mul0 (AB) = Alg.Mul0 (BA) as there are n
eigenvalues, counted with multiplicity.
(c) Give an example to show that Geo.Mul0 (AB) need not equal Geo.Mul0 (BA) even
when n = m.
5. Let A, B ∈ Mn (R). Also, let (λ1 , u) and (λ2 , v) are eigen-pairs of A and B, respectively.
164 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY
2 1 1
DR
3. Let A = 0 2 1. Then, A cannot be diagonalized.
0 0 2
Solution: A is diagonalizable implies A is similar to a diagonal matrix D with diagonal
entries {d1 , d2 , d3 } = {2, 2, 2}. Hence, D = 2I3 ⇒ A = SDS −1 = 2I3 , a contradiction.
" # " #! " #!
0 1 i −i
4. Let A = . Then, i, and −i, are two eigen-pairs of A. Define
−1 0 1 1
" # " #
i −i −i 0
U = √12 . Then, U ∗ U = I2 = U U ∗ and U ∗ AU = .
1 1 0 i
Proof. Let {v1 , . . . , vk } be linearly dependent. Then, there exists a smallest ` ∈ {1, . . . , k − 1}
and c 6= 0 such that v`+1 = c1 v1 + · · · + c` v` . So,
and
0 = (α`+1 − α1 ) c1 v1 + · · · + (α`+1 − α` ) c` v` .
DR
So, v` ∈ LS(v1 , . . . , v`−1 ), a contradiction to the choice of `. Thus, the required result follows.
An immediate corollary of Theorem 6.3.3 and Theorem 6.3.4 is stated next without proof.
k
S
So, to prove that Si is linearly independent, consider the linear system
i=1
in the variables cij ’s. Now, applying the matrix pj (A) and using Equation (6.3.3), we get
Y
(αj − αi ) cj1 uj1 + · · · + cjnj ujnj = 0.
i6=j
Q
But (αj − αi ) 6= 0 as αi ’s are distinct. Hence, cj1 uj1 + · · · + cjnj ujnj = 0. As Sj is a basis
i6=j
of Null(A − αj In ), we get cjt = 0, for 1 ≤ t ≤ nj . Thus, the required result follows.
i=1 i=1
k k
AF
P P
Now, assume that mi = ni , for 1 ≤ i ≤ k. Then A has ni = mi = n linearly
i=1 i=1
DR
3. Let A be an n×n matrix with λ ∈ σ(A) with alg.mulλ (A) = m. If Rank[A−λI] 6= n−m
then prove that A is not diagonalizable.
6.4. SCHUR’S UNITARY TRIANGULARIZATION AND DIAGONALIZABILITY 167
4. Let A and B be two similar matrices such that A is diagonalizable. Prove that B is
diagonalizable.
5. If σ(A) = σ(B) and both A and B are diagonalizable then prove that A is similar to B.
Thus, they are two basis representation of the same linear transformation.
" #
A 0
6. Let A ∈ Mn (R) and B ∈ Mm (R). Suppose C = . Then, prove that C is diagonal-
0 B
izable if and only if both A and B are diagonalizable.
1 0 −1 1 −3 3 1 3 3
0 0 1 , 0 −5 6 and 0 −5 6 diagonalizable?
11. Are the matrices
0 2 0 0 −3 4 0 −3 4
13. Let u, v ∈ Cn such that {u, v} is a linearly independent set. Define A = uvT + vuT .
(a) Then prove that A is a symmetric matrix.
(b) Then prove that dim(Ker(A)) = n − 2.
(c) Then 0 ∈ σ(A) and has multiplicity n − 2.
(d) Determine the other eigenvalues of A.
Lemma 6.4.1. [Schur’s unitary triangularization (SUT)] Let A ∈ Mn (C). Then, there exists
a unitary matrix U such that A is similar to an upper triangular matrix. Further, if A ∈ Mn (R)
and σ(A) have real entries then U is a real orthogonal matrix.
Proof. We prove the result by induction on n. The result is clearly true for n = 1. So, let n > 1
and assume the result to be true for k < n and prove it for n.
Let (λ1 , x1 ) be an eigen-pair of A with kx1 k = 1. Now, extend it to form an orthonormal
basis {x1 , x2 , . . . , xn } of Cn and define X = [x1 , x2 , . . . , xn ]. Then, X is a unitary matrix and
x∗
1∗ " #
x2 λ ∗
1
X ∗ AX = X ∗ [Ax1 , Ax2 , . . . , Axn ] = . [λ1 x1 , Ax2 , . . . , Axn ] = , (6.4.4)
.. 0 B
x∗n
where B ∈ Mn−1 (C). Now, by induction hypothesis there exists a unitary # U ∈ Mn−1 (C)
" matrix
such that U ∗ BU = T is an upper triangular matrix. Define U b = X 1 0 . As product of
0 U
unitary matrices is unitary, the matrix U
b is unitary and
" # " # " #" #" #
∗ 1 0 1 0 1 0 λ1 ∗ 1 0
U
b AUb = X ∗ AX =
0 U∗ 0 U 0 U∗ 0 B 0 U
T
= = = .
0 U ∗B 0 U 0 U ∗ BU 0 T
DR
" #
λ1 ∗
Since T is upper triangular, is upper triangular.
0 T
Further, if A ∈ Mn (R) and σ(A) has real entries then x1 ∈ Rn with Ax1 = λ1 x1 . Now, one
uses induction once again to get the required result.
Remark 6.4.2. Let A ∈ Mn (C). Then, by Schur’s Lemma there exists a unitary matrix U
such that U ∗ AU = T = [tij ], a triangular matrix. Thus,
Definition 6.4.3. Let A, B ∈ Mn (C). Then, A and B are said to be unitarily equiva-
lent/similar if there exists a unitary matrix U such that A = U ∗ BU .
Remark 6.4.4. We know that if two matrices are unitarily equivalent then they are necessarily
similar as U ∗ = U −1 , for every unitary matrix U . But, similarity doesn’t imply unitary equiv-
alence (see Exercise 6.4.6.5). In numerical calculations, unitary transformations are preferred
as compared to similarity transformations due to the following main reasons:
1. A is unitary implies kAxk = kxk. This need not be true under a similarity.
Exercise 6.4.6.
|tij |2 = tr(A∗ A) − |λi |2 .
P P
1. If A is unitarily similar to a triangular matrix T = [tij ] then
i<j
2. Consider
the following 6 matrices.
√ √ √
2 −1 3 2 2 1 3 2 2 0 3 2
√ √ √
M1 =
0 1 , M2 = 0 1
2 − 2, M3 = 1
1 2,
0 0 3 0 0 3 0 0 1
√
2 0 3 2 1 1 4 2 1 4
√
M4 = −1 1 − 2, M5 = 0 2
2 and M6 = 0
.
1 2
0 0 1 0 0 3 0 0 1
T
Now, use the exercises given below to conclude that the upper triangular matrix obtained
AF
Proof. By Schur’s Lemma there exists a unitary matrix U such that U ∗ AU = T = [tij ], a
n
Q n
Q
triangular matrix. By Remark 6.4.2, σ(A) = σ(T ). Hence, det(A) = det(T ) = tii = αi
i=1 i=1
n n
and tr(A) = tr(A(UU∗ )) = tr(U∗ (AU)) = tr(T) =
P P
tii = αi .
i=1 i=1
170 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY
We now use Schur’s unitary triangularization Lemma to state the main theorem of this sub-
section. Also, recall that A is said to be a normal matrix if AA∗ = A∗ A. Further, Hermitian,
skew-Hermitian and scalar multiples of Unitary matrices are examples of normal matrices.
Theorem 6.4.8. [Spectral Theorem for Normal Matrices] Let A ∈ Mn (C). If A is a normal
matrix then there exists a unitary matrix U such that U ∗ AU = diag(α1 , . . . , αn ).
Proof. By Schur’s Lemma there exists a unitary matrix U such that U ∗ AU = T = [tij ], a
triangular matrix. Since A is a normal
T ∗ T = (U ∗ AU )∗ (U ∗ AU ) = U ∗ A∗ AU = U ∗ AA∗ U = (U ∗ AU )(U ∗ AU )∗ = T T ∗ .
Thus, we see that T is an upper triangular matrix with T ∗ T = T T ∗ . Thus, by Exercise 1.3.13.8,
T is a diagonal matrix and this completes the proof.
We re-write Theorem 6.4.8 in another form to indicate that A can be decomposed into linear
combination of orthogonal projectors onto eigen-spaces. Thus, it is independent of the choice
of eigenvectors. This remark is also valid for Hermitian, skew-Hermitian and Unitary matrices.
(c) the columns of U form a set of orthonormal eigenvectors for A (use Theorem 6.3.3).
(d) A = A · In = A (u1 u∗1 + · · · + un u∗n ) = α1 u1 u∗1 + · · · + αn un u∗n .
Theorem 6.4.8 also implies that if A ∈ Mn (C) is a normal matrix then after a rotation or
reflection of axes (unitary transformation), the matrix A basically looks like a diagonal matrix.
As a special case, we now give the spectral theorem for Hermitian matrices.
Theorem 6.4.10. [Spectral Theorem for Hermitian Matrices] Let A ∈ Mn (C) be a Hermitian
matrix. Then Remark 6.4.9 holds. Further, all the eigenvalues of A are real.
Proof. The first part is immediate from Theorem 6.4.8 as Hermitian matrices are also normal
matrices. Let (α, x) be an eigen-pair. To show, α is a real number.
As A∗ = A and Ax = αx, we have x∗ A = x∗ A∗ = (Ax)∗ = (αx)∗ = αx∗ . Hence,
Corollary 6.4.11. Let A ∈ Mn (R) be symmetric. Then there exists an orthogonal matrix
P and real numbers α1 , . . . , αn such that A = P diag(α1 , . . . , αn )P T . Or equivalently, A is
diagonalizable using orthogonal matrix.
Exercise 6.4.12. 1. Let A be a normal matrix. If all the eigenvalues of A are 0 then prove
that A = 0. What happens if all the eigenvalues of A are 1?
4. Let σ(A) = {λ1 , . . . , λn }. Then, prove that the following statements are equivalent.
(a) A is normal.
(b) A is unitarily diagonalizable.
|aij |2 = |λi |2 .
P P
(c)
i,j i
(d) A has n orthonormal eigenvectors.
T
(b) (λ, x) is an eigen-pair for A∗ . [Hint: Verify kA∗ x − λxk2 = kAx − λxk2 .]
(a) if det(A) = 1 then A is a rotation about a fixed axis, in the sense that A has an
eigen-pair (1, x) such that the restriction of A to the plane x⊥ is a two dimensional
rotation in x⊥ .
172 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY
9. Let A be a normal matrix. Then, prove that Rank(A) equals the number of nonzero
eigenvalues of A.
10. [Equivalent characterizations of Hermitian matrices] Let A ∈ Mn (C). Then, the fol-
lowing statements are equivalent.
(a) The matrix A is Hermitian.
(b) The number x∗ Ax is real for each x ∈ Cn .
(c) The matrix A is normal and has real eigenvalues.
(d) The matrix S ∗ AS is Hermitian for each S ∈ Mn (C).
holds true as a matrix identity. This is a celebrated theorem called the Cayley Hamilton
theorem. We give a proof using Schur’s unitary triangularization. To do so, we look at
multiplication of certain upper triangular matrices.
Lemma 6.4.13. Let A1 , . . . , An ∈ Mn (C) be upper triangular matrices such that the (i, i)-th
entry of Ai equals 0, for 1 ≤ i ≤ n. Then A1 A2 · · · An = 0.
B[:, i] = A1 [:, 1](A2 )1i + A1 [:, 2](A2 )2i + · · · + A1 [:, n](A2 )ni = 0 + · · · + 0 = 0.
B[:, i] = C[:, 1](An )1i + C[:, 2](An )2i + · · · + C[:, n](An )ni = 0 + · · · + 0 = 0.
Theorem 6.4.14. [Cayley Hamilton Theorem] Let A ∈ Mn (C). Then A satisfies its charac-
teristic equation, i.e., if PA (x) = det(xIn − A) = xn − an−1 xn−1 + · · · + (−1)n−1 a1 x + (−1)n a0
then
An − an−1 An−1 + · · · + (−1)n−1 a1 A + (−1)n a0 I = 0
Therefore,
n
Y n
Y h i
PA (A) = (A − αi I) = (U T U ∗ − αi U IU ∗ ) = U (T − α1 I) · · · (T − αn I) U ∗ = U 0U ∗ = 0.
i=1 i=1
" #
1 1 3 2
Further, A2 = −2A + 5I implies A−1 = (A + 2I2 ) = and
5 5 1 −1
(a) Then, for any ` ∈ N, the division algorithm gives α0 , α1 , . . . , αn−1 ∈ C and a poly-
nomial f (x) with coefficients from C such that
iii. In the language of graph theory, it says the following: “Let G be a graph on n
vertices and A its adjacency matrix. Suppose there is no path of length n − 1 or
less from a vertex v to a vertex u in G. Then, G doesn’t have a path from v to u
of any length. That is, the graph G is disconnected and v and u are in different
components of G.”
(b) Suppose A is non-singular. Then, by definition a0 = det(A) 6= 0. Hence,
1
A−1 = a1 I − a2 A + · · · + (−1)n−2 an−1 An−2 + (−1)n−1 An−1 .
a0
T
(c) The above also implies that if A is invertible then A−1 ∈ LS I, A, A2 , . . . . That is,
DR
The next section deals with quadratic forms which helps us in better understanding of conic
sections in analytic geometry.
Lemma 6.5.2. Let A ∈ Mn (C). Then A is Hermitian if and only if at least one of the following
statements hold:
1. S ∗ AS is Hermitian for all S ∈ Mn .
2. A is normal and has real eigenvalues.
3. x∗ Ax ∈ R for all x ∈ Cn .
Remark 6.5.3. Let A ∈ Mn (R). Then the condition x∗ Ax ∈ R, for all x ∈ Cn , in Defini-
tion 6.5.8 implies AT = A, i.e., A is a symmetric matrix. But, when we study matrices over
R, we seldom consider vectors from Cn . So, in such cases, we assume A is symmetric.
" # " #
2 1 3 1+i
Example 6.5.4. 1. Let A = or A = . Then, A is positive definite.
1 2 1−i 4
" # "√ #
1 1 2 1+i
2. Let A = or A = √ . Then, A is positive semi-definite but not positive
1 1 1−i 2
definite.
" # " #
−2 1 −2 1 − i
3. Let A = or A = . Then, A is negative definite.
1 −2 1 + i −2
" # " #
−1 1 −2 1 − i
4. Let A = or A = . Then, A is negative semi-definite.
1 −1 1 + i −1
" # " #
0 1 1 1+i
5. Let A = or A = . Then, A is indefinite.
1 −1 1−i 1
Theorem 6.5.5. Let A ∈ Mn (C). Then, the following statements are equivalent.
1. A is positive semi-definite.
2. A∗ = A and each eigenvalue of A is non-negative.
T
Theorem 6.5.6. Let A ∈ Mn (C). Then, the following statements are equivalent.
1. A is positive definite.
2. A∗ = A and each eigenvalue of A is positive.
3. A = B ∗ B for a non-singular matrix B ∈ Mn (C).
Definition 6.5.8. Let A = [aij ] ∈ Mn (C) be a Hermitian matrix and let x, y ∈ Cn . Then, a
sesquilinear form in x, y ∈ Cn is defined as H(x, y) = y∗ Ax. In particular, H(x, x), denoted
H(x), is called a Hermitian form. In case A ∈ Mn (R), H(x) is called a quadratic form.
2. H(x, y) is ‘linear’ in the first component and ‘conjugate linear’ in the second component.
3. the quadratic form H(x) is a real number. Hence, for α ∈ R, the equation H(x) = α,
represents a conic in Rn .
Example 6.5.10. 1. Let A ∈ Mn (R). Then, f (x, y) = yT Ax, for x, y ∈ Rn , is a bilinear
form on Rn .
" # " #
1 2−i ∗ x
2. Let A = . Then, A = A and for x = ∈ C2 , verify that
2+i 2 y
where ‘Re’ denotes the real part of a complex number, is a sesquilinear form.
The main idea of this section is to express H(x) as sum or difference of squares. Since H(x) is
AF
a quadratic in x, replacing x by cx, for c ∈ C, just gives a multiplication factor by |c|2 . Hence,
DR
one needs to study only the normalized vectors. Let us consider Example 6.1.2 again. There
we see that
(x + y)2 (x − y)2
xT Ax = 3 − = (x + 2y)2 − 3y 2 , and (6.5.1)
2 2
(x + 2y)2 (2x − y)2 2y 50y 2
xT Bx = 5 + 10 = (3x − )2 + . (6.5.2)
5 5 3 9
Note that both the expressions in Equation (6.5.1) is the difference of two non-negative terms.
Whereas, both the expressions in Equation (6.5.2) consists of sum of two non-negative terms.
Is the number of non-negative terms, appearing in the above expressions, just a coincidence?
For a better understanding, we define inertia of a Hermitian matrix.
Definition 6.5.11. Let A ∈ Mn (C) be a Hermitian matrix. The inertia of A, denoted i(A),
is the triplet (i+ (A), i− (A), i0 (A)), where i+ (A) is the number of positive eigenvalues of A,
i− (A) is the number of negative eigenvalues of A and i0 (A) is the nullity of A. The difference
i+ (A) − i− (A) is called the signature of A.
Exercise 6.5.12. Let A ∈ Mn (C) be a Hermitian matrix. If the signature and the rank of A
is known then prove that one can find out the inertia of A.
To proceed with the earlier discussion, let A ∈ Mn (C) be Hermitian with eigenvalues
α1 , . . . , αn . Then, by Theorem 6.4.10, U ∗ AU = D = diag(α1 , . . . , αn ), for some unitary matrix
178 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY
z1
.
U . Let x = U z. Then, kxk = 1 implies kzk = 1. Thus, if z = .
. then
zn
n p r
√ p 2
| αi zi |2 −
X X X
H(x) = z∗ U ∗ AU z = z∗ Dz = αi |zi |2 = |αi | zi , (6.5.3)
i=1 i=1 i=p+1
Lemma 6.5.13. [Sylvester’s Law of Inertia] Let A ∈ Mn (C) be a Hermitian matrix and let
x ∈ Cn . Then, every Hermitian form H(x) = x∗ Ax, in n variables can be written as
where y1 , . . . , yr are linearly independent linear forms in the components of x and the integers
AF
Proof. Equation (6.5.3) implies that H(x) has the required form. We only need to show that
p and r are uniquely determined by A. Hence, let us assume on the contrary that there exist
p, q, r, s ∈ N with p > q such that
Remark 6.5.14. Since A is Hermitian, Rank(A) equals the number of nonzero eigenvalues.
Hence, Rank(A) = r. The number r is called the rank and the number r − 2p is called the
inertial degree of the Hermitian form H(x).
(T S −1 )∗ B(T S −1 ).
Conversely, suppose that A = P ∗ BP , for some invertible matrix P , and i(B) = (k, l, m).
DR
Similarly, X[:, k + 2]∗ AX[:, k + 2] = · · · = X[:, k + l]∗ AX[:, k + l] = −1. As the vectors
X[:, k + 1], . . . , X[:, k + l] are linearly independent, using 9.7.10, we see that A has at least
l negative eigenvalues.
3. Similarly, X[:, 1]∗ AX[:, 1] = · · · = X[:, k]∗ AX[:, k] = 1. As X[:, 1], . . . , X[:, k] are linearly
independent, using 9.7.10 again, we see that A has at least k positive eigenvalues.
Proposition 6.5.18. Consider the quadratic f (x, y) = ax2 + 2hxy + by 2 + 2gx + 2f y + c, for
a, b, c, g, f, h ∈ R. If (a, b, h) 6= (0, 0, 0) then f (x, y) = 0 represents
xT Ax = x, y U U = u v = α1 u2 + α2 v 2 ,
0 α2 y 0 α2 v
DR
" #
u
where = U T x. The lines u = 0, v = 0 are the two linearly independent linear forms, which
v
correspond to two perpendicular lines passing through the origin in the (x, y)-plane. In terms
of u, v, f (x, y) reduces to f (u, v) = α1 u2 + α2 v 2 + d1 u + d2 v + c, for some choice of d1 , d2 ∈ R.
We now look at different cases:
d2 2
f (u, v) = 0 ⇔ α2 v + = c1 − d1 u,
2α2
for some c1 ∈ R.
d2
(a) If d1 = 0, the quadratic corresponds to either the same line v + = 0, two parallel
2α2
lines or two imaginary lines, depending on whether c1 = 0, c1 α2 > 0 and c1 α2 < 0,
respectively.
(b) If d1 6= 0, the quadratic corresponds to a parabola of the form V 2 = 4aU , for some
translate U = u + α and V = v + β.
α1 (u + d1 )2 α2 (v + d2 )2
− = 1.
d3 d3
α1 (u + d1 )2 α2 (v + d2 )2
+ = 1.
d3 d3
Thus, we have considered all the possible cases and the required result follows.
" # " #
u T x
Remark 6.5.19. Observe that the linearly independent forms =U are functions of
v y
T
the eigenvectors u1 and u2 . Further, the linearly independent forms together with the shifting
AF
2 2
2. Let H(x)
" # − 5y + 20xy be the associated
= 10x " quadratic
√ #!
form for "
a class of#!
√
curves. Then
10 10 2/ 5 1/ 5
A= and the eigen-pairs are 15, √ and −10, √ . So, for
10 −5 1/ 5 −2/ 5
182 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY
f (x, y) = 0 ⇔ √ − = 1,
AF
6 3
DR
a hyperbola.
3. Let H(x) = 2 2
" # 6x + 9y + 4xy be the associated
" √quadratic
#! form"for a class
√ #!
of curves. Then,
6 2 1/ 5 2/ 5
A= , and the eigen-pairs are 10, √ and 5, √ . So, for
2 9 2/ 5 −1/ 5
(a) f (x, y) = 6x2 + 9y 2 + 4xy + 10y − 53, we have
x + 2y + 1 2 2x − y − 1 2
f (x, y) = 0 ⇔ + √ = 1,
5 5 2
an ellipse.
6.5. QUADRATIC FORMS 183
1. x2 + 2xy + y 2 + 6x + 10y = 3.
As a last application,
we consider
a
quadratic
in 3 variables,
namely x1 , x2 and x3 . To do so,
a h g x l y
1 1
let A = h b f , x = x2 , b = m and y = y2
with
g f c x3 n y3
f (x1 , x2 , x3 ) = xT Ax + 2bT x + q
= ax21 + bx22 + cx23 + 2hx1 x2 + 2gx1 x3 + 2f x2 x3
+2lx1 + 2mx2 + 2nx3 + q (6.5.6)
3. Depending on the values of αi ’s, rewrite g(y1 , y2 , y3 ) to determine the center and the
planes of symmetry of f (x1 , x2 , x3 ) = 0.
x+y+z 2 x−y 2 x + y − 2z 2
4 √ + √ + √ = −(4x + 2y + 4z + 2).
3 2 6
4(x + y + z) + 5 2 x−y+1 2 x + y − 2z − 1 2
√ √ √ 9
Or equivalently to 4 + + = 12 . So, the
4 3 2 6
principal axes of the quadric (an ellipsoid) are 4(x + y + z) = −5, x − y = 1 and x + y − 2z = 1.
y2 3x2 z2
Part 2 Here f (x, y, z) = 0 reduces to 10 − 10 − 10 = 1 which is the equation of a
hyperboloid consisting of two sheets with center 0 and the axes x, y and z as the principal axes.
3x2 y2 z2
Part 3 Here f (x, y, z) = 0 reduces to 10 − 10 + 10 = 1 which is the equation of a
hyperboloid consisting of one sheet with center 0 and the axes x, y and z as the principal axes.
T
Part 4 Here f (x, y, z) = 0 reduces to z = y 2 −3x2 +10 which is the equation of a hyperbolic
AF
paraboloid.
DR
Figure 6.5: Ellipsoid, hyperboloid of two sheets and one sheet, hyperbolic paraboloid
.
matrix A. We will do it over complex numbers and hence, the ideas from Theorem 6.4.10 will
be used. We start with the following result.
Lemma 6.6.1. Let A ∈ Mm,n (C) with m ≤ n and RankA = k ≤ m. Then A = U DV ∗ , where
2. D = Λ1/2 , and
3. V ∗ is formed by taking the first k rows of U ∗ A and adding m − k new rows so that V ∗ has
orthonormal rows.
Hence, the matrix AA∗ is a positive semi-definite matrix. Therefore, all it’s eigenvalues are
non-negative. So, by the spectral theorem, Theorem 6.4.10, AA∗ = U ΛU ∗ , where λii ≥ 0 are
in decreasing order. As RankA = k, λii > 0, for 1 ≤ i ≤ k and λii = 0, for k + 1 ≤ i ≤ m. Now,
T
( √
DR
1/ λii , if i ≤ k
σii =
1, otherwise.
1 1
X[1, :] = √ (U ∗ A)[1, :], . . . , X[k, :] = √ (U ∗ A)[k, :].
λ11 λkk
Or equivalently,
Now, take these k rows of X and add m − k many rows to form V ∗ , so that the rows of V ∗ are
orthonormal, i.e., V ∗ V = Im . Also, using (6.6.8), we see that (ΣU ∗ AA∗ U Σ)k+1,k+1 = 0. Thus,
A∗ A = (U DV ∗ )∗ (U DV ∗ ) = (V DU ∗ )(U DV ∗ ) = V D2 V ∗ ,
where D2 = diag(λ11 , . . . , λkk , 0, . . . 0) are the eigenvalues of A∗ A and the columns of V are the
corresponding eigenvectors.
T
AF
Corollary 6.6.2. [Polar decomposition] Let A ∈ Mm,n (C) with m ≤ n. Then A = P W , for
some positive semi-definite matrix P with RankP = RankA and a matrix W having orthonormal
DR
Corollary 6.6.3. [Singular value decomposition] Let A ∈ Mm,n (C) with m ≤ n and RankA =
k ≤ m. Then A = U DV ∗ , where
3. V ∗ is formed by taking the first k rows of U ∗ A and adding n − k new rows so that V is
an n × n unitary matrix.
√ √
Definition 6.6.4. Let A ∈ Mm,n . In view of Corollary 6.6.3, the values λ11 , . . . , λrr , where
r = min{m, n}, are called the singular values of A. (Sometimes only the nonzero λii ’s are
understood to be the singular values of A).
Let A ∈ Mm,n (C). Then, by the singular value decomposition of A we mean writing
A = U ΣV ∗ , where U ∈ Mm (C), V ∈ Mn (C) are unitary matrices and Σ ∈ Mm,n (R) with Σii as
the singular values of A, for 1 ≤ i ≤ RankA and the remaining entries of Σ being 0.
In Corollary 6.6.3, we saw that the matrix U is obtained as the unitary matrix in the spectral
decomposition of AA∗ , the Σii ’s are the square-root of the eigenvalues of AA∗ , and V ∗ is formed
by taking the first r = RankA rows of U ∗ A and adding n − k new rows so that V ∗ is a unitary
matrix.
Now, let us go backi to matrixh multiplication and ∗
h i try to understand A = U ΣV . So, let
U = u1 u2 · · · um and V = v1 v2 · · · vm . Then,
√
λ11 0 0 · · · 0 0 ∗
√ v1
0 λ22 0 · · · 0 0 ∗
h i √ v2
A = U ΣV ∗ = u1 u2 · · · um 0 0 λ33 · · · 0 0
.
.. .. .. . . .. .. ..
. . . . . .
∗
vn
0 0 0 ··· 0 0
λ11 u1 v1∗ + λ22 u2 v2∗ + · · · + λmm um vm
∗
p p p
= . (6.6.10)
T
AF
states that there exist unitary matrices (rotation or reflection matrices) in both the domain
(Cn , corresponds to V ∗ ) and co-domain (Cm , corresponds to U ) such that the matrix of A with
respect to these ordered bases is a diagonal matrix and the diagonal entries consist of just the
singular values, including zeros.
√ √ √
We also note that if r = RankA then A = λ11 u1 v1∗ + λ22 u2 v2∗ + · · · + λrr ur vr∗ . Thus,
A = U1 Σ1 V1∗ , where U1 is a submatrix of U consisting of the first r orthonormal columns, Σ1
is a diagonal matrix of non-zero singular values and V1∗ is a submatrix of V ∗ consisting of the
first r orthonormal rows. More specifically,
√
λ11 0 v∗··· 0
√ 1∗
h i 0 λ22 · · · 0 v2
A = u1 u2 · · · ur . .
.. .. .
0 0 . .
.
√
0 0 ··· λrr vr∗
" # " #
2 1 1 6 3
Example 6.6.5. Let A = . Then, AAT = . Thus, AAT = U DU T , where
1 2 −1 3 6
" # " # " #
1 1 1 9 0 3 0 0
U=√ and D = . Hence, Σ = √ . Here,
2 1 −1 0 3 0 3 0
#" 1
√1
" # " #
T 1 3 3 0 3 0 √
2 2
0
U A= √ = √ .
2 1 −1 2 0 3 √16 − √16 √26
188 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY
√1 √1 0 5 4 1
12 2
Thus, V T = − √16 √2 and it’s rows are the eigenvectors of AT A = 4
5 −1.
√
6 6
√1 −1
√ −1
√ 1 −1 2
3 3 3
In actual computations, the values of m and n could be very large. Also, the largest and
the smallest eigenvalues or the rows and columns of A that are of interest to us may be very
small. So, in such cases, we compute the singular value decomposition to relate the above ideas
or to find clusters which have maximum influence on the problem being looked. For example,
in the above computation, the singular value 3 is the larger of the two singular values. So, if
we are looking at the largest deviation or movement etc. then we need to concentrate on the
singular value 3. Then, using equation (6.6.10), note that 3 is associated with the first column
√
of U and the first row of V T . Similarly, 3 is associated with the second column of U and the
second row of V T .
Note that in any computation, we need to decompose our problem into sub-problems. If
the decomposition into sub-problems is possible through orthogonal decomposition then in some
sense the sub-problems can be handled separately. That’s how the singular value decomposition
helps us in applications. This is the reason, that with slight change, SVD is also called “factor
analysis” or “principal component analysis” and so on.
Exercise 6.6.6. 1. Let A ∈ Mm,n (C) with m ≥ n. Then A = W Q, for some positive
semi-definite matrix Q and a matrix W of orthonormal columns.
T
AF
2. Let A ∈ Mn,1 (C). Illustrate the polar decomposition and the singular value decompositions
for A = ei and for A = e1 + 2e2 + · · · + nen .
DR
3. Let A ∈ Mm,n (C) with RankA = r. If d1 , . . . , dr are the non-zero singular values of A
then, there exist Σ ∈ Mm,n (R),"and unitary
# matrices U ∈ Mm (C) and V ∈ Mn (C) such
Σ1 0
that A = U ΣV ∗ , where Σ = with Σ1 = diag(d1 , . . . , dr ). Then, prove that
0 0
" #
−1
Σ1 0
G = V DU ∗ , for D = ∈ Mn,m (C) is the pseudo-inverse of A.
0 0
Chapter 7
k
AF
M
W −1 AW = Ti , where, Ti ∈ Mmi (C), for 1 ≤ i ≤ k
DR
i=1
and Ti ’s are upper triangular matrices with constant diagonal λi . If A has real entries with real
eigenvalues then W can be chosen to have real entries.
Proof. By Schur Upper Triangularization (see Lemma 6.4.1), there exists a unitary matrix U
such that U ∗ AU = T , an upper triangular matrix with diag(T ) = (λ1 , . . . , λ1 , . . . , λk , . . . , λk ).
Now, for any upper triangular matrix B, a real number α and i < j, consider the matrix
F (B, i, j, α) = Eij (−α)BEij (α), where the matrix Eij (α) is defined in Definition 2.2.5. Then,
for 1 ≤ k, ` ≤ n,
Bij − αBjj + αBii , whenever k = i, ` = j
B − αB , whenever ` 6= j
i` j`
(F (B, i, j, α))k` = (7.1.1)
Bkj + αBki , whenever k 6= i
Bk` , otherwise.
Now, using Equation (7.1.1), the diagonal entries of F (T, i, j, α) and T are equal and
(
Tij , whenever Tjj = Tii
(F (T, i, j, α))ij = Tij
0, whenever Tjj 6= Tii and α = Tjj −Tii .
Thus, if we denote the matrix F (T, i, j, α) by T1 then (F (T1 , i − 1, j, α))i−1,j = 0, for some
choice of α, whenever (T1 )i−1,i−1 6= Tjj . Moreover, this operation also preserves the 0 created by
189
190 CHAPTER 7. JORDAN CANONICAL FORM
F (T, i, j, α) at (i, j)-th place. Similarly, F (T1 , i, j + 1, α) preserves the 0 created by F (T, i, j, α)
at (i, j)-th place. So, we can successively apply the following sequence of operations to get
where α, β, . . . , γ are appropriately chosen and Tm1 [:, m1 + 1] = λ2 em1 +1 . Thus, observe that
the above operation can be applied for different choices of i and j with i < j to get the required
result.
Practice 7.1.2. Apply Theorem 7.1.1 to the matrix given below for better understanding.
1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7
0 0 0 2 3 4 5 6 7
0 0 0 0 2 3 4 5 6.
0 0 0 0 0 2 3 4 5
0 0 0 0 0 0 3 4 5
0 0 0 0 0 0 0 3 4
0 0 0 0 0 0 0 0 3
Definition 7.1.3. 1. Let λ ∈ C and k be a positive integer. Then, by the Jordan block
T
λ 1
DR
.. ..
. .
.
λ 1
λ
2. A Jordan matrix is a direct sum of Jordan blocks. That is, if A is a Jordan matrix
having r blocks then there exist positive integers ki ’s and complex numbers λi ’s (not
necessarily distinct), for 1 ≤ i ≤ r such that
5. Observe that the number of Jordan matrices of size 4 with 0 on the diagonal are 5.
We now give some properties of the Jordan blocks. The proofs are immediate and hence left
for the reader. They will be used in the proof of subsequent results.
(b) Rank(Jk (λ) − αIk ) = k, whenever α 6= λ. Or equivalently, for all α 6= λ the matrix
DR
2. Let J be a Jordan matrix that contains ` Jordan blocks for λ. Then, prove that
(a) Rank(J − λI) = n − `.
(b) J has ` linearly independent eigenvectors for λ.
(c) Rank(J − λI) ≥ Rank((J − λI)2 ) ≥ Rank((J − λI)3 ) ≥ · · · .
3. Let A ∈ Mn (C). Then, prove that AJn (λ) = Jn (λ)A if and only if AJn (0) = Jn (0)A.
Definition 7.1.7. Let J be a Jordan matrix containing Jt (λ), for some positive integer t
and some complex number λ. Then, the smallest value of k for which Rank((J − λI)k ) stops
decreasing is the order of the largest Jordan block Jk (λ) in J. This number k is called the
index of the eigenvalue λ.
Lemma 7.1.8. Let A ∈ Mn (C) be strictly upper triangular. Then, A is similar to a direct sum
of Jordan blocks. Or equivalently, there exists integers n1 ≥ . . . ≥ nm ≥ 1 and a non-singular
matrix S such that
A = S −1 Jn1 (0) ⊕ · · · ⊕ Jnm (0) S.
Proof. We will prove the result by induction on n. For n = 1, the statement is trivial. So, let
the result be # matrices of size ≤ n − 1 and let A ∈ Mn (C) be strictly upper triangular.
" true for
0 a T
Then, A = . By induction hypothesis there exists an invertible matrix S1 such that
0 A1
m
A1 = S1−1 Jn1 (0) ⊕ · · · ⊕ Jnm (0) S1 with
X
ni = n − 1.
i=1
Thus,
" # " # " #" #" # " # 0 aT1 aT2
1 0 1 0 1 0 0 aT 1 0 0 a T S1
A = = =
0 Jn1 (0) 0 ,
0 S1−1 0 S1 0 S1−1 0 A1 0 S1 0 S −1 A1 S1
0 0 J
h i
where S1−1 Jn1 (0) ⊕ · · · ⊕ Jnm (0) S1 = Jn1 (0) ⊕ J and aT S1 = aT1 aT2 . Now, writing Jn1 to
mean Jn1 (0) and using Remark 7.1.5.4e, we have
1 −aT1 JnT1 0 0 aT1 aT2 1 aT1 JnT1 0 0 ha1 , e1 ieT1 aT2
= 0 .
0 In1 0
0 Jn1 0 0 In1 0 Jn1 0
0 0 I 0 0 J 0 0 I 0 0 J
n1
first case, A is similar to 0 Jn1 0 . This in turn is similar to 0 0 aT2
AF
by permuting
0 0 J 0 0 J
DR
the first row and column. At this stage, one can apply induction and if necessary do a block
permutation, in order to keep the block sizes in decreasing order.
So, let us now assume that ha1 , e1 i =
6 0. Then, writing α = ha1 , e1 i, we have
1 T aT T T
0 0 0 αe 1 2 α 0 0 0 e1 a 2
" #
α Jn1 +1 e a
1 2
T
0 I 0 = 0 Jn1 0 ≡ .
0 I 0 0 Jn 0
1
1
0 J
0 0 αI 0 0 J 0 0 αI 0 0 J
If necessary, we need to do a block permutation, in order to keep the block sizes in decreasing
order. Hence, the required result follows.
0 1 1 0 1 2
Practice 7.1.9. Convert 0 0 1 to J3 (0) and 0 0 0 to J2 (0) ⊕ J1 (0).
0 0 0 0 0 0
7.1. JORDAN CANONICAL FORM THEOREM 193
a Jordan matrix with 0 on the diagonal and the size of the Jordan blocks decreases as we move
down the diagonal. So, Si−1 Ti Si = J(λi ) is a Jordan matrix with λi on the diagonal and the
size of the Jordan blocks
k decreases
as we move down the diagonal.
Si . Then, verify that W −1 AW is a Jordan matrix.
L
Now, take W = S
i=1
Let A ∈ Mn (C). Suppose λ ∈ σ(A) and J is a Jordan matrix that is similar to A. Then, for
each fixed i, 1 ≤ i ≤ n, by `i (λ), we denote the number of Jordan blocks Jk (λ) in J for which
k ≥ i. Then, the next result uses Exercise 7.1.6 to determine the number `i (λ).
Remark 7.1.11. Let A ∈ Mn (C). Suppose λ ∈ σ(A) and J is a Jordan matrix that is similar
to A. Then, for 1 ≤ k ≤ n,
Proof. In view of Exercise 7.1.6, we need to consider only the Jordan blocks Jk (λ), for different
AF
n
L
values of k. Hence, without loss of generality, let us assume that J = ai Ji (λ), where ai ’s are
DR
i=1
non-negative integers and J contains exactly ai copies of the Jordan block Ji (λ), for 1 ≤ i ≤ n.
Then, by definition and Exercise 7.1.6, we observe the following:
P
1. n = iai .
i≥1
P
2. Rank(J − λI) = (i − 1)ai .
i≥2
3. Rank((J − λI)2 ) =
P
(i − 2)ai .
i≥3
X X X
`2 = ai = (i − 1)ai − (i − 2)ai = Rank(J − λI) − Rank((J − λI)2 ),
i≥2 i≥2 i≥3
..
.
X X X
`k = ai = (i − (k − 1))ai − (i − k)ai = Rank((J − λI)k−1 ) − Rank((J − λI)k ).
i≥k i≥k i≥k+1
Now, the required result follows as rank is invariant under similarity operation and the matrices
J and A are similar.
194 CHAPTER 7. JORDAN CANONICAL FORM
Lemma 7.1.12. [Similar Jordan matrices] Let J and J 0 be two similar Jordan matrices of
size n. Then, J is a block permutation of J 0 .
Proof. For 1 ≤ i ≤ n, let `i and `0i be, respectively, the number of Jordan blocks of J and J 0
of size at least i corresponding to λ. Since J and J 0 are similar, the matrices (J − λI)i and
(J 0 − λI)i are similar for all i, 1 ≤ i ≤ n. Therefore, their ranks are equal for all i ≥ 1 and
hence, `i = `0i for all i ≥ 1. Thus the required result follows.
We now state the main result of this section which directly follows from Lemma 6.4.1,
Theorem 7.1.1 and Corollary 7.1.10 and hence the proof is omitted.
Theorem 7.1.13. [Jordan canonical form theorem] Let A ∈ Mn (C). Then, A is similar to
a Jordan matrix J, which is unique up to permutation of Jordan blocks. If A ∈ Mn (R) and has
real eigenvalues then the similarity transformation matrix S may be chosen to have real entries.
This matrix J is called the the Jordan canonical form of A, denoted Jordan CF(A).
Example 7.1.14. Let us use the idea from Lemma 7.1.11 to find the Jordan Canonical Form
of the following matrices.
0 0 1 0
0 0 0 1
1. Let A = J4 (0)2 = .
T
0 0 0 0
AF
0 0 0 0
DR
Solution: Note that `1 = 4 − Rank(A − 0I) = 2. So, there are two Jordan blocks.
Also, `2 = Rank(A − 0I) − Rank((A − 0I)2 ) = 2. So, there are at least 2 Jordan blocks of
size 2. As there are exactly two Jordan blocks, both the blocks must have size 2. Hence,
Jordan CF(A) = J2 (0) ⊕ J2 (0).
1 1 0 1
0 1 1 1
2. Let A1 =
.
0 0 1 1
0 0 0 1
Solution: Let B = A1 − I. Then, `1 = 4 − Rank(B) = 1. So, B has exactly one Jordan
block and hence A1 is similar to J4 (1).
1 1 0 1
0 1 1 1
3. A2 = 0
.
0 1 0
0 0 0 1
Solution: Let C = A2 − I. Then, `1 = 4 − Rank(C) = 2. So, C has exactly two Jordan
blocks. Also, `2 = Rank(C) − Rank(C 2 ) = 1 and `3 = Rank(C 2 ) − Rank(C 3 ) = 1. So, there
is at least 1 Jordan blocks of size 3.
Thus, we see that there are two Jordan blocks and one of them is of size 3. Also, the size
of the matrix is 4. Thus, A2 is similar to J3 (1) ⊕ J1 (1).
7.1. JORDAN CANONICAL FORM THEOREM 195
i=1
k
AF
L
T = Ti into the required Jordan matrix.
i=1
DR
n
2. Let A ∈ Mn (C) be a diagonalizable matrix. Then, by definition, A is similar to
L
λi ,
i=1
n
L
where λi ∈ σ(A), for 1 ≤ i ≤ n. Thus, Jordan CF(A) = λi , up to a permutation of
i=1
λi ’s.
3. In general, the computation
" # of Jordan CF(A) is not numerically stable. To understand
0
this, let A = . Then, A is diagonalizable as A has distinct eigenvalues. So,
1 0
" #
0
Jordan CF(A ) = .
0 0
" # " #
0 0 0 1
Whereas, for A = , we know that Jordan CF(A) = 6= lim Jordan CF(A ).
1 0 0 0 →0
Thus, a small change in the entries of A may change Jordan CF(A) significantly.
4. Let A ∈ Mn (C) and > 0 be given. Then, there exists an invertible matrix S such
k
that S −1 AS =
L
Jni (λi , ), where Jni (λi , ) is obtained from Jni (λi ) by replacing each
i=1
off diagonal entry 1 by an . To get this, define Di() = diag(1, , 2 , . . . , ni −1 ), for
k
(Di())−1 Jni (λi )Di() .
L
1 ≤ i ≤ k. Now compute
i=1
5. Let Jordan CF(A) contain ` Jordan blocks for λ. Then, A has ` linearly independent
eigenvectors for λ.
196 CHAPTER 7. JORDAN CANONICAL FORM
For if, A has at least ` + 1 linearly independent eigenvectors for λ, then dim(Null(A −
λI)) > `. So, Rank(A − λI) < n − `. But, the number of Jordan blocks for λ in A is `.
Thus, we must have Rank(J − λI) = n − `, a contradiction.
6. Let λ ∈ σ(A). Then, by Remark 7.1.5.5, Geo.Mulλ (A) = the number of Jordan blocks
Jk (λ) in Jordan CF(A).
7. Let λ ∈ σ(A). Then, by Remark 7.1.5.3, Alg.Mulλ (A) = the sum of the sizes of all
Jordan blocks Jk (λ) in Jordan CF(A).
8. Let λ ∈ σ(A). Then, Jordan CF(A) does not get determined
by Alg.Mulλ (A) and
" # 0 1 0 " # " # " #
h i 0 1 0 1 0 1 0 1
Geo.Mulλ (A). For example, 0 ⊕ ⊕0 0 1 and 0 0 ⊕ 0 0 ⊕ 0 0
0 0
0 0 0
are different Jordan CFs but they have the same algebraic and geometric multiplicities.
9. Let A ∈ Mn (C). Suppose that, for each λ ∈ σ(A), the values of Rank(A − λI)k , for
k = 1, . . . , n are known. Then, using Remark 7.1.11, Jordan
h i CF(A) can be computed.
But, note here that finding rank is numerically unstable as has rank 1 but it converges
h i
to 0 which has a different rank.
Proof. Let Kn = .
1
DR
Definition 7.2.1. Let P (t) = tn + an−1 tn−1 + · · · + a0 be a monic polynomial in t of degree
0 0 0 · · · 0 −a0
1 0 0
··· 0 −a1
0 1 0 · · · 0 −a
2
n. Then, the n × n matrix A = .. . . .. .. , denoted A(n : a0 , . . . , an−1 ) or
T
0 0 . . . .
AF
0 0 0
· · · 0 −an−2
DR
0 0 0 1 −an−1
Companion(P ), is called the companion matrix of P (t).
Remark 7.2.2. Let A ∈ Mn (C) and let f (x) = xn +an−1 xn−1 +· · ·+a1 x+a0 be its characteristic
polynomial. Then by the Cayley Hamilton Theorem, An + an−1 An−1 + · · · + a1 A + a0 I = 0.
Hence An = −(an−1 An−1 + · · · + a1 A + a0 I).
Suppose, there exists a vector u ∈ Cn such that B = u, Au, A2 u, . . . , An−1 u is an ordered
Definition 7.2.3. Let A ∈ Mn (C). Then, the polynomial P (t) is said to annihilate (destroy)
A if P (A) = 0.
198 CHAPTER 7. JORDAN CANONICAL FORM
Definition 7.2.4. Let A ∈ Mn (C). Then, the minimal polynomial of A, denoted mA (x), is
a monic polynomial of least positive degree satisfying mA (A) = 0.
Theorem 7.2.5. Let A be the companion matrix of the monic polynomial P (t) = tn +an−1 tn−1 +
· · · + a0 . Then, P (t) is both the characteristic and the minimal polynomial of A.
Now, Suppose we have a monic polynomial Q(t) = tm + bm−1 tm−1 + · · · + b0 , with m < n,
such that Q(A) = 0. Then, using Equation (7.2.1), we get
Lemma 7.2.6. [Existence of the minimal polynomial] Let A ∈ Mn (C). Then, there exists a
unique monic polynomial m(x) of minimum (positive) degree such that m(A) = 0. Further, if
f (x) is any polynomial with f (A) = 0 then m(x) divides f (x).
Proof. Let P (x) be the characteristic polynomial of A. Then, deg(P (x)) = n and by the
Cayley-Hamilton Theorem, P (A) = 0. So, consider the set
Also, without loss of generality, we can assume that m(x) is monic and unique (non-
uniqueness will lead to a polynomial of smaller degree in S).
Now, suppose there is a polynomial f (x) such that f (A) = 0. Then, by division algorithm,
there exist polynomials q(x) and r(x) such that f (x) = m(x)q(x) + r(x), where either r(x) is
identically the zero polynomial of deg(r(x)) < M = deg(m(x)). As
we get r(A) = 0. But, m(x) was the least degree polynomial with m(A) = 0 and hence r(x) is
the zero polynomial. That is, m(x) divides f (x).
As an immediate corollary, we have the following result.
Corollary 7.2.7. [Minimal polynomial divides the characteristic polynomial] Let mA (x)
and PA (x) be, respectively, the minimal and the characteristic polynomials of A ∈ Mn (C).
1. Then, mA (x) divides PA (x).
Proof. The first part following directly from Lemma 7.2.6. For the second part, let (λ, x) be an
eigen-pair. Then, f (A)x = f (λ)x, for any polynomial of f , implies that
mA (λ)x = mA (A)x = 0x = 0.
T
AF
Lemma 7.2.8. Let A and B be two similar matrices. Then, they have the same minimal
polynomial.
Proof. Since A and B are similar, there exists an invertible matrix S such that A = S −1 BS.
Hence, f (A) = F (S −1 BS) = S −1 f (B)S, for any polynomial f . Hence, mA (A) = 0 if and only
if mA (B) = 0 and thus the required result follows.
k
(x−λi )αi , for some αi ’s with 1 ≤ αi ≤ Alg.Mulλi (A).
Q
Proof. Using 7.2.7, we see that mA (x) =
i=1
k
(J − λi I)αi = 0. But, observe that
Q
As mA (A) = 0, using Lemma 7.2.8 we have mA (J) =
i=1
for the Jordan block Jni (λi ), one has
1. (Jni (λi ) − λi I)αi = 0 if and only if αi ≥ ni , and
k k k k
(J − λi I)ni = 0 and (x − λi )ni divides (x − λi )αi = mA (x) and (x − λi )ni
Q Q Q Q
Thus
i=1 i=1 i=1 i=1
is a monic polynomial, the result follows.
As an immediate consequence, we also have the following result which corresponds to the
converse of the above theorem.
Theorem 7.2.10. Let A ∈ Mn (C) and let λ1 , . . . , λk be the distinct eigenvalues of A. If the
k
(x − λi )ni then ni is the size of the largest Jordan block for
Q
minimal polynomial of A equals
i=1
λi in J = Jordan CFA.
Theorem 7.2.11. Let A ∈ Mn (C). Then, the following statements are equivalent.
1. A is diagonalizable.
2. Every zero of mA (x) has multiplicity 1.
d
3. Whenever mA (α) = 0, for some α, then mA (x)x=α 6= 0.
dx
Proof. Part 1 ⇒ Part 2. If A is diagonalizable, then each Jordan block in J = Jordan CFA
k
Q
has size 1. Hence, by Theorem 7.2.9, mA (x) = (x−λi ), where λi ’s are the distinct eigenvalues
i=1
of A.
k
T
Q
Part 2 ⇒ Part 3. Let mA (x) = (x − λi ), where λi ’s are the distinct eigenvalues of A.
AF
i=1
Then, mA (x) = 0 if and only if x = some i, 1 ≤ i ≤ k. In that case, it is easy to verify
λi , for
d
DR
10. Let A ∈ Mn (C) be an invertible matrix. Then prove that if the minimal polynomial of A
equals m(x, λ1 , . . . , λk ) then the minimal polynomial of A−1 equals m(x, 1/λ1 , . . . , 1/λk ).
DR
11. Let λ an eigenvalue of A ∈ Mn (C) with two linearly independent eigenvectors. Show that
there does not exist a vector u ∈ Cn such that LS u, Au, A2 u, . . . = Cn .
We end this section with a method to compute the minimal polynomial of a given matrix.
Example 7.2.14. [Computing the minimal polynomial] Let λ1 , . . . , λk be the distinct eigen-
values of A ∈ Mn (C).
using S −1 A = JS −1 . verify that the initial problem x0 (t) = Ax(t) is equivalent to the equation
S −1 x0 (t) = S −1 Ax(t) which in turn is equivalent to y0 (t) = Jy(t), where S −1 x(t) = y(t) with
y(0) = S −1 x(0) = 0. Therefore, if y is a solution to the second equation then x(t) = Sy is a
solution to the initial problem.
When J is diagonalizable then solving the second is as easy as solving yi0 (t) = λi yi (t) for
which the required solution is given by yi (t) = yi (0)eλi t .
If J is not diagonal, then for each Jordan block, the system reduces to
This problem can also be solved as in this case the solution is given by yk = c0 eλt ; yk−1 =
(c0 t + c1 )eλt and so on.
Let P (x) be a polynomial and A ∈ Mn (C). Then, P (A)A = AP (A). What about the converse?
That is, suppose we are given that AB = BA for some B ∈ Mn (C). Does it necessarily imply
that B = P (A), for some nonzero polynomial P (x)? The answer is No as I commutes with A
for every A. We start with a set of remarks.
Theorem 7.3.1. Let A ∈ Mn (C) and B ∈ Mm (C). Then, the linear system AX − XB = 0, in
T
AF
the variable matrix X of size n × m, has a unique solution, namely X = 0 (the trivial solution),
if and only if σ(A) and σ(B) are disjoint.
DR
Thus, we see that if λ is a common eigenvalue of A and B then the system AX − XB = 0 has
a nonzero solution X0 , a contradiction. Hence, the required result follows.
Corollary 7.3.2. Let A ∈ Mn (C), B ∈ Mm (C) and C be an n × m matrix. Also, assume that
σ(A) and σ(B) are disjoint. Then, it can be easily verified that the system AX − XB = C, in
the variable matrix X of size n × m, has a unique solution, for any given C.
7.3. APPLICATIONS OF JORDAN CANONICAL FORM 203
Proof. Consider the linear transformation T : Mn,m (C) → Mn,m (C), defined by T (X) =
AX − XB. Then, by Theorem 7.3.1, Null(T ) = {0}. Hence, by the rank-nullity theorem, T
is a bijection and the required result follows.
a3 a2 a1 b1
b1 b2 b3 b4
0 b1 b2 b3
Toeplitz type matrix. and the matrix B = is an upper triangular Toeplitz
0 0 b1 b2
0 0 0 b1
type matrix.
Exercise 7.3.4. Let Jn (0) ∈ Mn (C) be the Jordan block with 0 on the diagonal.
1. Further, if A ∈ Mn (C) such that AJn (0) = Jn (0)A then prove that A is an upper Toeplitz
type matrix.
2. Further, if A, B ∈ Mn (C) are two upper Toeplitz type matrices then prove that
(a) there exists ai ∈ C, 1 ≤ i ≤ n, such that A = a0 I + a1 Jn (0) + · · · + an Jn (0)n−1 .
T
AF
To proceed further, recall that a matrix A ∈ Mn (C) is called non-derogatory if Geo.Mulα (A) =
1, for each α ∈ σ(A) (see Definition 6.3.9).
Theorem 7.3.5. Let A ∈ Mn (C) be a non-derogatory matrix. Then, the matrices A and B
commute if and only if B = P (A), for some polynomial P (t) of degree at most n − 1.
Proof. If B = P (A), for some polynomial P (t), then A and B commute. Conversely, suppose
that AB = BA, σ(A) = {λ1 , . . . , λk } and let J = Jordan CFA = S −1 AS be the Jordan matrix
Jn1 (λ1 ) B 11 · · · B 1k
.. . Now, write B = S −1 BS = .. . . . ..
of A. Then, J = . . ., where
Jnk (λk ) B k1 · · · B kk
B is partitioned conformally with J. Note that AB = BA gives JB = BJ. Thus, verify that
Fi (Jni (λi ))−1 B ii = c1 I + c2 Jni (0) + · · · + cni Jni (0)ni −1 = Ri (Jni (λi )), (say).
Thus, B ii = Fi (Jni (λi ))Ri (Jni (λi )). Putting Pi (t) = Fi (t)Ri (t), for 1 ≤ i ≤ k, we see that Pi (t)
is a polynomial of degree at most n − 1 with Pi ((Jnj (λj )) = 0, for j 6= i and Pi ((Jnj (λi )) = B ii .
Taking, P = P1 + · · · + Pk , we have
Jn1 (λ1 ) Jn1 (λ1 )
P (J) = P1
..
+ · · · + Pk ..
. .
Jnk (λk ) Jnk (λk )
B 11 0
= ..
+ · · · + ...
= B.
.
0 B kk
Hence, B = SBS −1 = SP (J)S −1 = P (SJS −1 ) = P (A) and the required result follows.
T
AF
DR
Chapter 8
Advanced Topics on
Diagonalizability and
Triangularization∗
We start this subsection with a few definitions and examples. So, it will be nice to recall the
T
notations used in Section 1.5 and a few results from Appendix 9.2.
AF
1. Also, let S ⊆ [n]. Then, det (A[S, S]) is called the Principal minor of A corresponding
to S.
1 2 3 4
5 6 7 8
Example 8.1.3. Let A =
9
. Then, note that
8 7 6
5 4 3 2
1. EM1 (A) = 1 + 6 + 7 + 2 = 16 and EM2 (A) = det A({1, 2}, {1, 2}) + det A({1, 3}, {1, 3}) +
det A({1, 4}, {1, 4}) + det A({2, 3}, {2, 3}) + det A({2, 4}, {2, 4}) + det A({3, 4}, {3, 4}) =
−80.
205
206CHAPTER 8. ADVANCED TOPICS ON DIAGONALIZABILITY AND TRIANGULARIZATION∗
n
1. the coefficient of tn−k in PA (t) =
Q
(t − λi ), the characteristic polynomial of A, is
i=1
X
(−1)k λi1 · · · λik = (−1)k Sk (λ1 , . . . , λn ). (8.1.1)
i1 <···<ik
For all i ∈ S, consider all permutations σ such that σ(i) = i. Our idea is to select a ‘t’ from
AF
these biσ(i) . Since we do not want any more ‘t’, we set t = 0 for any other diagonal position. So
DR
the contribution from S to the coefficient of tn−k is det[−A(S|S)] = (−1)k det A(S|S). Hence
the coefficient of tn−k in PA (t) is
X X
(−1)k det A(S|S) = (−1)k det A[T, T ] = (−1)k Ek (A).
S⊆[n], |S|=n−k T ⊆[n], |T |=k
Let A and B be similar matrices. Then, by Theorem 6.2.3, we know that σ(A) = σ(B).
Thus, as a direct consequence of Part 2 of Theorem 8.1.4 gives the following result.
Corollary 8.1.6. Let A and B be two similar matrices of order n. Then, EMk (A) = EMk (B)
for 1 ≤ k ≤ n.
So, the sum of principal minors of similar matrices are equal. Or in other words, the sum
of principal minors are invariant under similarity.
Proof. For 1 ≤ i ≤ n, let us denote A(i|i) by Ai . Then, using Equation (8.1.3), we have
n
X X X X
PAi (t) = tn−1 − EM1 (Ai )tn−2 + · · · + (−1)n−1 EMn−1 (Ai )
i=1 i i i
= ntn−1 − (n − 1)EM1 (A)tn−2 + (n − 2)EM2 (A)tn−3 − · · · + (−1)n−1 EMn−1 (A)
= PA0 (t).
Proof. As Alg.Mulα (A) = 1, PA (t) = (t − λ)q(t), where q(t) is a polynomial with q(λ) 6=
0. Thus PA0 (t) = q(t) + (t − λ)q 0 (t). Hence, PA0 (λ) = q(λ) 6= 0. Thus, by Corollary 8.1.7,
0
P
i PA(i|i) (λ) = PA (λ) 6= 0. Hence, there exists i, 1 ≤ i ≤ n such that PA(i|i) (λ) 6= 0. That is,
det[A(i|i) − λI] 6= 0 or Rank[A − λI] = n − 1.
" #
0 1
Remark 8.1.9. Converse of Corollary 8.1.8 is false. Note that for the matrix A = ,
0 0
Rank[A − 0I] = 1 = 2 − 1 = n − 1, but 0 has multiplicity 2 as a root of PA (t) = 0.
1. Geo.Mulλ (A) ≥ k.
2. If B is a principal sub-matrix of A of size m > n − k then λ ∈ σ(B).
3. Alg.Mulλ (A) ≥ k.
Proof. Part 1⇒ Part 2. Let {x1 , . . . , xk } be linearly independent eigenvectors for λ and
" let #
B
B ∗
be a principal sub-matrix of A of size m > n − k. Without loss, we may write A = .
∗ ∗
" #
xi1
Let us partition the xi ’s , say xi = , such that
xi2
" #" # " #
B ∗ xi1 xi1
=λ , for 1 ≤ i ≤ k.
∗ ∗ xi2 xi2
As m > n − k, the size of xi2 is less than k. Thus, the set {x12 , . . . , xk2"} is# linearly dependent
y1
(see Corollary 3.3.9). So, there is a nonzero linear combination y = of x1 , . . . , xk such
y2
that y2 = 0. Notice that y1 6= 0 and By1 = λy1 .
n
Part 2⇒ Part 3. By Corollary 8.1.7, we know that PA0 (t) =
P
PA(i|i) (t). As A(i|i) is of size
i=1
n − 1, we get PA(i|i) (λ) = 0, for all i = 1, 2, . . . , n. Thus, PA0 (λ) = 0. A similar argument now
(2) (2) d
applied to each of the A(i|i)’s, gives PA (λ) = 0, where PA (t) = PA0 (t). Proceeding on above
dt
(i)
lines, we finally get PA (λ) = 0, for i = 0, 1, . . . , k − 1. This implies that Alg.Mulλ (A) ≥ k.
208CHAPTER 8. ADVANCED TOPICS ON DIAGONALIZABILITY AND TRIANGULARIZATION∗
Theorem 8.1.12. [Newton’s identities] Let P (t) = tn + an−1 tn−1 + · · · + a0 have zeros
n
λki . Then, for 1 ≤ k ≤ n,
P
λ1 , . . . , λn , counted with multiplicities. Put µk =
i=1
That is, the first n moments of the zeros determine the coefficients of P (t).
Proof. For simplicity of expression, let an = 1. Then, using Equation (8.1.4), we see that
k = 1 gives us an−1 = −µ1 . To compute an−2 , put k = 2 in Equation (8.1.4) to verify that
−µ2 +µ21
an−2 = 2 . This process can be continued to get all the coefficients of P (t). Now, let us
prove the n given equations.
P 1 P 0 (t)
Define f (t) = t−λi = P (t) and take |t| > max |λi |. Then, the left hand side can be
i i
re-written as
n n n h
X 1 X 1 X 1 λi i n µ
1
f (t) = = = + 2 + ··· = + 2 + ··· . (8.1.5)
t − λi λi t t t t
i=1 i=1 t 1 − i=1
t
T
hn µ1 ih i
nan tn−1 + (n − 1)an−1 tn−2 + · · · + a1 = P 0 (t) = + + · · · an tn
+ · · · + a0 .
DR
t t2
Now, equating the coefficient of tn−k−1 on both sides, we get
Remark 8.1.13. Let P (t) = an tn + · · · + a1 t + a0 with an = 1. Thus, we see that we need not
find the zeros of P (t) to find the k-th moments of the zeros of P (t). It can directly be computed
recursively using the Newton’s identities.
Exercise 8.1.14. Let A, B ∈ Mn (C). Then, prove that A and B have the same eigenvalues if
and only if tr(Ak ) = tr(Bk ), for k = 1, . . . , n.
Thus, the set G forms a group with respect to multiplication. We now define this group.
Theorem 8.2.3. [Generalizing a Unitary Matrix] Let A be an invertible matrix. Then A−1
is similar to A∗ if and only if there exists an invertible matrix B such that A = B −1 B ∗ .
some θ ∈ R.
Note that for any θ ∈ R, if we put Sθ = eiθ S then
That is, −e2iθ ∈ σ(S −1 S ∗ ). Thus, if we choose θ0 ∈ R such that −e2i(θ0 ) 6∈ σ(S −1 S ∗ ) then H( θ0 )
is nonsingular.
To get our result, we finally choose B = β(αI − A∗ )H( θ0 ) such that β 6= 0 and α = eiγ ∈
/
σ(A∗ ).
Note that with α and β chosen as above, B is invertible. Furthermore,
As we need, BA = B ∗ , we get βH( θ0 )(αA − I) = βH( θ0 )(αI − A) and thus, we need β = −βα,
which holds true if β = ei(π−γ)/2 . Thus, the required result follows.
Exercise 8.2.4. Suppose that A is similar to a unitary matrix. Then, prove that A−1 is similar
to A∗ .
210CHAPTER 8. ADVANCED TOPICS ON DIAGONALIZABILITY AND TRIANGULARIZATION∗
Definition 8.2.5. [Plane Rotations] For a fixed positive integer n, consider the vector space
Rn with standard basis {e1 , . . . , en }. Also, for 1 ≤ i, j ≤ n, let Ei,j = ei eTj . Then, for θ ∈ R
and 1 ≤ i, j ≤ n, a plane rotation, denoted U (θ; i, j), is defined as
U (θ; i, j) = I − Ei,i − Ej,j + [Ei,i + Ej,j ] cos θ − Ei,j sin θ + Ej,i sin θ.
1
.
..
cos θ − sin θ ← i-th row
That is, U (θ; i, j) = ..
, where the unmentioned
.
← j-th row
sin θ cos θ
..
.
1
diagonal entries are 1 and the unmentioned off-diagonal entries are 0.
Remark 8.2.6. Note the following about the matrix U (θ; i, j), where θ ∈ R and 1 ≤ i, j ≤ n.
5. Thus, for x ∈ Rn , the choice of θ for which yj = 0, where y = U (θ; i, j)x equals
Then, we can either apply a plane rotation along the xy-plane or the yz-plane. For the
xy-plane, we need the plane z = 1 (xy plane lifted by 1). This plane contains the vector 1.
Imagine moving the tip of ~1 on this plane. Then this locus corresponds to a circle that lies
√
on the plane z = 1, has radius 2 and is centred at (0, 0, 1). That is, we draw the circle
x2 +y 2 = 1 on the xy-plane and then lifted it up by so that it lies on the plane z = 1. Thus,
note that the xz-plane cuts this circle at two points. These two points of intersections give
us the two choices for the vector v (see Figure 8.1). A similar calculation can be done for
the yz-plane.
8.2. METHODS FOR TRIDIAGONALIZATION AND DIAGONALIZATION 211
(0, 0
, 0)
7. In general, in Rn , suppose that we want to apply plane rotation to a along the x1 x2 -plane
so that the resulting vector has 0 in the 2-nd coordinate. In that case, our circle on x1 x2 -
p h iT
plane has radius r = a21 + a22 and it gets translated by 0, 0, a3 , · · · an . So, there
h iT
are two points x on this circle with x2 = 0 and they are ±r, 0, a3 , · · · an .
T
8. Consider three mutually orthogonal unit vectors, say x, y, z. Then, x can be brought to e1
AF
by two plane rotations, namely by an appropriate U (θ1 ; 1, 3) and U (θ2 ; 1, 2). Thus,
DR
As unitary transformations preserve angles, note that ŷ(1) = ẑ(1) = 0. Now, we can apply
an appropriate plane rotation U (θ3 ; 2, 3) so that U (θ3 ; 2, 3)ŷ = e2 . Since e3 is the only
unit vector in R3 orthogonal to both e1 and e2 , it follows that U (θ3 ; 2, 3)ẑ = e3 . Thus,
h i h i
I = e1 e2 e3 = U (θ3 ; 2, 3)U (θ2 ; 1, 2)U (θ1 ; 1, 3) x y z .
Hence, any real orthogonal matrix A ∈ M3 (R) is a product of three plane rotations.
We are now ready to give another method to get the QR-decomposition of a square matrix
(see Theorem 4.6.1 that uses the Gram-Schmidt Orthonormalization Process).
Proposition 8.2.7. [QR Factorization Revisited: Square Matrix] Let A ∈ Mn (R). Then
there exists a real orthogonal matrix Q and an upper triangular matrix R such that A = QR.
Proof. We start by applying the plane rotations to A so that the positions (2, 1), (3, 1), . . . , (n, 1)
of A become zero. This means, if a21 = 0, we multiply by I. Otherwise, we use the plane rotation
U (θ; 1, 2), where θ = cot−1 (−a11 /a21 ). Then, we apply a similar technique to A so that the
212CHAPTER 8. ADVANCED TOPICS ON DIAGONALIZABILITY AND TRIANGULARIZATION∗
(3, 1) entry of A becomes 0. Note that this plane rotation doesn’t change the (2, 1) entry of A.
We continue this process till all the entry in the first column of A, except possibly the (1, 1)
entry, is zero.
We then apply the plane rotations to make positions (3, 2), (4, 2), . . . , (n, 2) zero. Observe
that this does not disturb the zeros in the first column. Thus, continuing the above process a
finite number of times give us the required result.
Lemma 8.2.8. [QR Factorization Revisited: Rectangular Matrix] Let A ∈ Mm,n (R). Then
there exists a real orthogonal matrix Q and a matrix R ∈ Mm,n (R) in upper triangular form
such that A = QR.
Proof. If RankA < m, add some columns to A to get a matrix, say à such that Rankà = m. Now
suppose that à has k columns. For 1 ≤ i ≤ k, let vi = Ã[:, i]. Now, apply the Gram-Schmidt
Orthonormalization Process to {v1 , . . . , vk }. For example, h suppose the
i result is a sequence of k
vectors w1 , 0, w2 , 0, 0, . . . , 0, wm , 0, . . . , 0, where Q = w1 · · · wm is real orthogonal. Then
Ã[:, 1] is a linear combination of w1 , Ã[:, 2] is also a linear combination of w1 , Ã[:, 3] is a linear
combination of w1 , w2 and so on. In general, for 1 ≤ s ≤ k, the column Ã[:, s] is a linear
m
P
combination of wi -s in the list that appear up to the s-th position. Thus, Ã[:, s] = wi ris ,
i=1
where ris = 0 for all i > s. That is, Ã = QR, where R = [rij ]. Now, remove the extra columns
of à and the corresponding columns in R to get the required result.
T
Note that Proposition 8.2.7 is also valid for any complex matrix. In this case the matrix Q
AF
will be unitary. This can also be seen from Theorem 4.6.1 as we need to apply the Gram-Schmidt
DR
Proof. If a31 6= 0, then put U1 = U (θ1 ; 2, 3), where θ1 = cot−1 (−a21 /a31 ). Notice that U1T [:
, 1] = e1 and so
(U1 AU1T )[:, 1] = (U1 A)[:, 1].
We already know that U1 A[3, 1] = 0. Hence, U1 AU1T is a real symmetric matrix with (3, 1)-
th entry 0. Now, proceed to make the (4, 1)-th entry of U1 A equal to 0. To do so, take
U2 = U (θ2 ; 2, 4). Notice that U2T (:, 1) = e1 and so
But by our choice of the plane rotation U2 , we have U2 (U1 AU1T )(4, 1) = 0. Furthermore, as
U2 [3, :] = eT3 , we have
(U2 U1 AU1T )[3, 1] = U2 [3, :](U1 AU1T )[:, 1] = (U1 AU1T )[3, 1] = 0.
8.2. METHODS FOR TRIDIAGONALIZATION AND DIAGONALIZATION 213
Proof. The idea is to reduce the off-diagonal entries of A to 0 as much as possible. So, we start
with choosing i 6= j) such that i < j and |aij | is maximum. Now, put
1 aii − ajj
θ= cot−1 , U = U (θ; i, j), and B = U T AU.
2 2aij
Thus, using the above, we see that whenever l, k 6= i, j, a2lk = b2lk and for l 6= i, j, we have
T
AF
As the rest of the diagonal entries have not changed, we observe that the sum of the squares of
the off-diagonal entries have reduced by 2a2ij . Thus, a repeated application of the above process
makes the matrix “close to diagonal”.
We will now look at another class of unitary matrices, commonly called the Householder matrices
(see Exercise 1.5.5.8).
Definition 8.2.11. [Householder Matrix] Let w ∈ Cn be a unit vector. Then, the matrix
Uw = I − 2ww∗ is called a Householder matrix.
Recall that if x, y ∈ Rn with x 6= y and kxk = kyk then, (x +"y)# ⊥ (x − y)." This # is not
1 i
true in Cn as can be seen from the following example. Take x = and y = . Then
1 −1
" # " #
1+i 1−i
h , i = (1 + i)2 6= 0. Thus, to pick the right choice for the matrix Uw , we need to
0 2
be observant of the choice of the inner product space.
T
Example 8.2.14. Let x, y ∈ Cn with x 6= y and kxk = kyk. Then, which Uw should be used
AF
to reflect y to x?
DR
1. Solution in case of Rn : Imagine the line segment joining x and y. Now, place a mirror
at the midpoint and perpendicular to the line segment. Then, the reflection of y on that
x−y
mirror is x. So, take w = kx−yk ∈ Rn . Then,
x−y
Uw y = (I − 2wwT )y = y − 2wwT y = y − 2 (x − y)T y
kx − yk2
x − y −kx − yk2
= y−2 = x.
kx − yk2 2
x−y
But, in that case, w = kx−yk will work as using above kx − yk2 = 2(y − x)∗ y and
x−y
Uw y = (I − 2ww∗ )y = y − 2ww∗ y = y − 2 (x − y)∗ y
kx − yk2
x − y −kx − yk2
= y−2 = x.
kx − yk2 2
Proof. If v = e1 then we proceed to apply our technique to the matrix B, a matrix of lower
order. So, without loss of generality, we assume that v 6= e1 .
As we want QT AQ to be tri-diagonal, we need to find a vector w ∈ Rn−1 such that Uw v =
re1 ∈ Rn−1 , where r = kvk = kUw vk. Thus, using Example 8.2.14, choose the required vector
T
w ∈ Rn−1 . Then,
AF
DR
where S ∈ Mn−1 (R) is a symmetric matrix. Now, use induction on the matrix S to get the
required result.
Definition 8.2.16. Let s and t be two symbols. Then, an expression of the form
Remark 8.2.17. [More on Unitary Equivalence] Let s and t be two symbols and W (s, t) be
a word in symbols s and t.
1. Suppose U is a unitary matrix such that B = U ∗ AU . Then, W (A, A∗ ) = U ∗ W (B, B ∗ )U .
Thus, tr[W(A, A∗ )] = tr[W(B, B∗ )].
2. Let A and B be two matrices such that tr[W(A, A∗ )] = tr[W(B, B∗ )], for each word W .
Then, does it imply that A and B are unitarily equivalent? The answer is ‘yes’ as provided
by the following result. The proof is outside the scope of this book.
216CHAPTER 8. ADVANCED TOPICS ON DIAGONALIZABILITY AND TRIANGULARIZATION∗
Exercise 8.2.19. [Triangularization via Complex Orthogonal Matrix need not be Possi-
ble] Let A ∈ Mn (C) and A = QT QT , where Q is complex orthogonal matrix and T is upper
triangular. Then, prove that
1. A has an eigenvector x such that xT x 6= 0.
" #
1 i
2. there is no orthogonal matrix Q such that QT Q is upper triangular.
i −1
Proposition 8.2.20. [Matrices with Distinct Eigenvalues are Dense in Mn (C)] Let A ∈
Mn (C). Then, for each > 0, there exists a matrix A() ∈ Mn (C) such that A() = [a()ij ] has
|aij − a()ij |2 < .
P
distinct eigenvalues and
Proof. By Schur Upper Triangularization (see Lemma 6.4.1), there exists a unitary matrix U
such that U ∗ AU = T , an upper triangular matrix. Now, choose αi ’s such that tii + αi are
|αi |2 < . Now, consider the matrix A() = U (T + diag(α1 , . . . , αn )) U ∗ . Then,
P
distinct and
B = A() − A = U diag(α1 , . . . , αn )]U ∗ with
X X
|bij |2 = tr(B∗ B) = trU diag(|α1 |2 , . . . , |αn |2 )U∗ = |αi |2 < .
T
i,j i
AF
Before proceeding with our next result on almost diagonalizability, we look at the following
example.
" #
1 2
Example 8.2.21. Let A = and > 0 be given. Then, determine a diagonal matrix D
0 3
such that the non-diagonal entry of D−1 AD is less than .
Solution: Choose α < and define D = diag(1, α). Then,
2
" #" #" # " #
1 0 1 2 1 0 1 2α
D−1 AD = = .
0 α1 0 3 0 α 0 3
As α < , the required result follows.
2
Proposition 8.2.22. [A matrix is Almost Diagonalizable] Let A ∈ Mn (C) and > 0 be
given. Then, there exists an invertible matrix S such that S−1 AS = T , an upper triangular
matrix with |tij | < , for all i 6= j.
Proof. By Schur Upper Triangularization (see Lemma 6.4.1), there exists a unitary matrix U
such that U ∗ AU = T , an upper triangular matrix. Now, take t = 2 + max |tij | and choose α
i<j
such that 0 < α < /t. Then, if we take Dα = diag(1, α, α2 , . . . , αn−1 ) and S = U Dα , we have
S −1 AS = D 1 T Dα = F (say), an upper triangular. Furthermore, note that for i < j, we have
α
|fij | = |tij |αj−i ≤ . Thus, the required result follows.
8.3. COMMUTING MATRICES AND SIMULTANEOUS DIAGONALIZATION 217
Definition 8.3.1. [Simultaneously Diagonalizable] Let A, B ∈ Mn (C). Then, they are said
to be simultaneously diagonalizable if there exists an invertible matrix S such that S −1 AS
and S −1 BS are both diagonal matrices.
Theorem 8.3.3. Let A, B ∈ Mn (C) be diagonalizable matrices. Then they are simultaneously
diagonalizable if and only if they commute.
Proof. One part of this theorem has already been proved in Proposition 8.3.2. For the other
T
part, let us assume that AB = BA. Since A is diagonalizable, there exists an invertible matrix
AF
S such that
DR
S −1 AS = Λ = λ1 I ⊕ · · · ⊕ λk I, (8.3.1)
and
T1−1 C11 T1 Λ1
.. .. .. ..
T −1 S −1 BST =
. . . = . .
Tk−1 Ckk Tk Λk
Thus A and B are simultaneously diagonalizable and the required result follows.
Theorem 8.3.7. Let F ⊆ Mn (C) be a commuting family of matrices. Then, all the matrices
in F have a common eigenvector.
Proof. We prove the result by induction on n. The result is clearly true for n = 1. So, let us
assume the result to be valid for all n < m. Now, let us assume that F ⊆ Mm (C) is a family of
diagonalizable matrices.
If F is simultaneously diagonalizable, then by Proposition 8.3.2, the family F is commuting.
Conversely, let F be a commuting family. If each A ∈ F is a scalar matrix then they are simul-
taneously diagonalizable via I. So, let A ∈ F be a non-scalar matrix. As A is diagonalizable,
there exists an invertible matrix S such that
S −1 AS = λ1 I ⊕ · · · ⊕ λk I, k ≥ 2,
Remark 8.3.9. [σ(AB) and σ(BA)] Let m ≤ n, A ∈ Mm×n (C), and B ∈ Mn×m (C). Then
σ(BA) = σ(AB) with n − m extra 0’s. In particular, if A, B ∈ Mn (C) then, PAB (t) = PBA (t).
4. Now, use continuity to argue that PAB (t) = lim PAα B (t) = lim PBAα (t) = PBA (t).
α→0+ α→0+
5. Let σ(A) = {λ1 , . . . , λn }, σ(B) = {µ1 , . . . , µn } and suppose that AB = BA. Then,
(a) prove that there is a permutation π such that σ(A+B) = {λ1 +µπ(1) , . . . , λn +µπ(n) }.
In particular, σ(A + B) ⊆ σ(A) + σ(B).
(b) if we further assume that σ(A) ∩ σ(−B) = ∅ then the matrix A + B is nonsingular.
6. Let A and B be two non-commuting matrices. Then, give an example to show that it is
difficult to relate σ(A + B) with σ(A) and σ(B).
0 1 0 0 0 0
7. Are the matrices A = 0 0 −1 and B = 1 0 0 simultaneously triangularizable?
0 0 0 0 1 0
8. Let F ⊆ Mn (C) be a family of commuting normal matrices. Then, prove that each element
of F is simultaneously unitarily diagonalizable.
9. Let A ∈ Mn (C) with A∗ = A and x∗ Ax ≥ 0, for all x ∈ Cn . Then prove that σ(A) ⊆ R+
and if tr(A) = 0, then A = 0.
Proposition 8.3.11. [Triangularization: Real Matrix] Let A ∈ Mn (R). Then, there exists a
T
real orthogonal matrix Q such that QT AQ is block upper triangular, where each diagonal block
AF
is of size either 1 or 2.
DR
Proof. If all the eigenvalues of A are real then the corresponding eigenvectors have real entries
and hence, one can use induction to get the result in this case (see Lemma 6.4.1).
So, now let us assume that A has a complex eigenvalue, say λ = α + iβ with β 6= 0 and
x = u + iv as an eigenvector for λ. Thus, Ax = λx and hence Ax = λx. But, λ 6= λ as
β 6= 0. Thus, the eigenvectors x, x are linearly independent and therefore, {u, v} is a linearly
independent set. By Gram-Schmidt Orthonormalization process, we get an ordered basis, say
{w1 , w2 , . . . , wn } of Rn , where LS(w1 , w2 ) = LS(u, v). Also, using the eigen-condition Ax =
λx gives
Aw1 = aw1 + bβw2 , Aw2 = cw1 + dw2 ,
where B ∈ Mn−2 (R). Now, by induction hypothesis the required result follows.
The next result is a direct application of Proposition 8.3.11 and hence the proof is omitted.
Proposition 8.3.13. Let A ∈ Mn (R). Then the following statements are equivalent.
1. A is normal.
2. There exists a real orthogonal matrix Q such that QT AQ =
L
i Ai , where Ai ’s are real
normal matrices of size either 1 or 2.
Proof. 2 ⇒ 1 is trivial. To prove 1 ⇒ 2, recall that Proposition 8.3.11 gives the existence of
a real orthogonal matrix Q such that QT AQ is upper triangular with diagonal blocks of size
either 1 or 2. So, we can write
λ1 ∗ ∗ ∗ ∗ ∗
..
0 . ∗ ∗ ∗ ∗
" #
0 · · · λp ∗ ∗ ∗ R C
QT AQ = = (say).
0 ··· 0 A11 · · · A1k 0 B
..
0 ··· 0 0 . ∗
T
0 ··· 0 0 · · · Akk
AF
DR
Remark 8.3.16. 1. Let A be a diagonalizable matrix with ρ(A) < 1. Then, A is a convergent
matrix.
Proof. Let A = U ∗ diag(λ1 , . . . , λn )U . As ρ(A) < 1, for each i, 1 ≤ i ≤ n, λm
i → 0 as
m → ∞. Thus, Am = U ∗ diag(λm m
1 , . . . , λn )U → 0.
2. Even if the matrix A is not diagonalizable, the above result holds. That is, whenever
ρ(A) < 1, the matrix A is convergent. The converse is also true.
Proof. Let Jk (λ) = λIk + Nk be a Jordan block of J = Jordan CFA. Then as Nkk = 0,
for each fixed k, we have
Theorem 8.3.17. [Decomposition into Diagonalizable and Nilpotent Parts] Let A ∈ Mn (C).
Then A = B + C, where B is diagonalizable matrix and C is nilpotent such that BC = CB.
T
nilpotent matrix.
Now, note that DN = N D as for each Jordan block Jk (λ) = Dk + Nk , we have Dk = λI and
DR
Appendix
Remark 9.1.2. Recall that in Remark 9.2.16.1, it was observed that each permutation is a
AF
1. Verify that the elementary matrix Eij is the permutation matrix corresponding to the
transposition (i, j) .
2. Thus, every permutation matrix is a product of elementary matrices E1j , 1 ≤ j ≤ n.
1 0 0 0 1 0
3. For n = 3, the permutation matrices are I3 , 0 0 1 = E23 = E12 E13 E12 , 1
0 0=
0 1 0 0 0 1
0 1 0 0 0 1 0 0 1
E12 ,
0 0 1 = E12 E13 , 1 0 0 = E13 E12 and 0 1 0 = E13 .
1 0 0 0 1 0 1 0 0
4. Let f ∈ Sn and P f = [pij ] be the corresponding permutation matrix. Since pij = δi,j and
{f (1), . . . , f (n)} = [n], each entry of P f is either 0 or 1. Furthermore, every row and
column of P f has exactly one nonzero entry. This nonzero entry is a 1 and appears at
the position pi,f (i) .
5. By the previous paragraph, we see that when a permutation matrix is multiplied to A
(a) from left then it permutes the rows of A.
(b) from right then it permutes the columns of A.
6. P is a permutation matrix if and only if P has exactly one 1 in each row and column.
Solution: If P has exactly one 1 in each row and column, then P is a square matrix, say
223
224 CHAPTER 9. APPENDIX
n × n. Now, apply GJE to P . The occurrence of exactly one 1 in each row and column
implies that these 1’s are the pivots in each column. We just need to interchange rows to
get it in RREF. So, we need to multiply by Eij . Thus, GJE of P is In and P is indeed a
product of Eij ’s. The other part has already been explained earlier.
Theorem 9.1.3. Let A and B be two matrices in RREF. If they are row equivalent then A = B.
Proof. Note that the matrix A = 0 if and only if B = 0. So, let us assume that the matrices
A, B 6= 0. Also, the row-equivalence of A and B implies that there exists an invertible matrix
C such that A = CB, where C is product of elementary matrices.
Since B is in RREF, either B[:, 1] = 0T or B[:, 1] = (1, 0, . . . , 0)T . If B[:, 1] = 0T then
A[:, 1] = CB[:, 1] = C0 = 0. If B[:, 1] = (1, 0, . . . , 0)T then A[:, 1] = CB[:, 1] = C[:, 1]. As C is
invertible, the first column of C cannot be the zero vector. So, A[:, 1] cannot be the zero vector.
Further, A is in RREF implies that A[:, 1] = (1, 0, . . . , 0)T . So, we have shown that if A and B
are row-equivalent then their first columns must be the same.
Now, let us assume that the first k − 1 columns of A and B are equal and it contains r
pivotal columns. We will now show that the k-th column is also the same.
Define Ak = [A[:, 1], . . . , A[:, k]] and Bk = [B[:, 1], . . . , B[:, k]]. Then, our assumption implies
that A[:, i] = B[:, i], for 1 ≤ i ≤ k − 1. Since, the first k − 1 columns contain r pivotal columns,
there exists a permutation matrix P such that
T
AF
" # " #
Ir W A[:, k] Ir W B[:, k]
Ak P = and Bk P = .
DR
0 0 0 0
" # If the k-th columns of A and B are pivotal columns then by definition of RREF, A[:, k] =
0
= B[:, k], where 0 is a vector of size r and e1 = (1, 0, . . . , 0)T . So, we need to consider two
e1
cases depending on whether both are non-pivotal or one is pivotal and the other is not.
As A = CB, we get Ak = CBk and
" # " #" # " #
Ir W A[:, k] C1 C2 Ir W B[:, k] C1 C1 W CB[:, k]
= Ak P = CBk P = = .
0 0 C3 C4 0 0 C3 C3 W
" #
I r C2
So, we see that C1 = Ir , C3 = 0 and A[:, k] = B[:, k].
0 C4
Case 1: Neither A[:, k] nor B[:, k] are pivotal. Then
" # " # " #" # " #
X I r C2 I r C2 Y Y
= A[:, k] = B[:, k] = = .
0 0 C4 0 C4 0 0
Thus, X = Y and in this case the k-th columns are equal.
Case 2: A[:, k] is pivotal but B[:, k] in non-pivotal. Then
" # " # " #" # " #
0 I r C2 Ir C2 Y Y
= A[:, k] = B[:, k] = = ,
e1 0 C4 0 C4 0 0
a contradiction as e1 6= 0. Thus, this case cannot arise.
Therefore, combining both the cases, we get the required result.
9.2. PERMUTATION/SYMMETRIC GROUPS 225
Example 9.2.2. Let A = {1, 2, 3}, B = {a, b, c, d} and C = {α, β, γ}. Then, the function
1. j : A → B defined by j(1) = a, j(2) = c and j(3) = c is neither one-one nor onto.
2. f : A → B defined by f (1) = a, f (2) = c and f (3) = d is one-one but not onto.
3. g : B → C defined by g(a) = α, g(b) = β, g(c) = α and g(d) = γ is onto but not one-one.
4. h : B → A defined by h(a) = 2, h(b) = 2, h(c) = 3 and h(d) = 1 is onto.
5. h ◦ f : A → A is a bijection.
6. g ◦ f : A → C is neither one-one not onto.
Exercise 9.2.5. Let S3 be the set consisting of all permutation on 3 elements. Then, prove
that S3 has 6 elements. Moreover, they are one of the 6 functions given below.
1. f1 (1) = 1, f1 (2) = 2 and f1 (3) = 3.
2. f2 (1) = 1, f2 (2) = 3 and f2 (3) = 2.
3. f3 (1) = 2, f3 (2) = 1 and f3 (3) = 3.
4. f4 (1) = 2, f4 (2) = 3 and f4 (3) = 1.
5. f5 (1) = 3, f5 (2) = 1 and f5 (3) = 2.
6. f6 (1) = 3, f6 (2) = 2 and f6 (3) = 1.
Remark 9.2.6. Let f : [n] → [n] be a bijection. Then, the inverse of f , denote f −1 , is defined
by f −1 (m) = ` whenever f (`) = m for m ∈ [n] is well defined and f −1 is a bijection. For
example, in Exercise 9.2.5, note that fi−1 = fi , for i = 1, 2, 3, 6 and f4−1 = f5 .
Remark 9.2.7. Let Sn = {f : [n] → [n] : σ is a permutation}. Then, Sn has n! elements and
forms a group with respect to composition of functions, called product, due to the following.
226 CHAPTER 9. APPENDIX
1. Let f ∈ Sn . Then,
!
1 2 ··· n
(a) f can be written as f = , called a two row notation.
f (1) f (2) · · · f (n)
(b) f is one-one. Hence, {f (1), f (2), . . . , f (n)} = [n] and thus, f (1) ∈ [n], f (2) ∈ [n] \
{f (1)}, . . . and finally f (n) = [n]\{f (1), . . . , f (n−1)}. Therefore, there are n choices
for f (1), n − 1 choices for f (2) and so on. Hence, the number of elements in Sn
equals n(n − 1) · · · 2 · 1 = n!.
4. Sn has a special permutation called the identity permutation, denoted Idn , such that
Idn (i) = i, for 1 ≤ i ≤ n.
Lemma 9.2.8. Fix a positive integer n. Then, the group Sn satisfies the following:
2. Sn = {g −1 : g ∈ Sn }.
Proof. Part 1: Note that for each α ∈ Sn the functions f −1 ◦α, α◦f −1 ∈ Sn and α = f ◦(f −1 ◦α)
T
as well as α = (α ◦ f −1 ) ◦ f .
AF
Part 2: Note that for each f ∈ Sn , by definition, (f −1 )−1 = f . Hence the result holds.
DR
Definition 9.2.9. Let f ∈ Sn . Then, the number of inversions of f , denoted n(f ), equals
3. Let f = (1, 3, 5, 4) and g = (2, 4, 1) be two cycles. Then, their product, denoted f ◦ g or
(1, 3, 5, 4)(2, 4, 1) equals (1, 2)(3, 5, 4). The calculation proceeds as (the arrows indicate the
images):
1 → 2. Note (f ◦ g)(1) = f (g(1)) = f (2) = 2.
2 → 4 → 1 as (f ◦ g)(2) = f (g(2)) = f (4) = 1. So, (1, 2) forms a cycle.
3 → 5 as (f ◦ g)(3) = f (g(3)) = f (3) = 5.
5 → 4 as (f ◦ g)(5) = f (g(5)) = f (5) = 4.
4 → 1 → 3 as (f ◦ g)(4) = f (g(4)) = f (1) = 3. So, the other cycle is (3, 5, 4).
4. Let f = (1, 4, 5) and g = (2, 4, 1) be two permutations. Then, (1, 4, 5)(2, 4, 1) = (1, 2, 5)(4) =
(1, 2, 5) as 1 → 2, 2 → 4 → 5, 5 → 1, 4 → 1 → 4 and
(2, 4, 1)(1, 4, 5) = (1)(2, 4, 5) = (2, 4, 5) as 1 → 4 → 1, 2 → 4, 4 → 5, 5 → 1 → 2.
!
1 2 3 4 5
5. Even though is not a cycle, verify that it is a product of the cycles
4 3 2 5 1
(1, 4, 5) and (2, 3).
2. in general, the r-cycle (i1 , . . . , ir ) = (1, i1 )(1, ir )(1, ir−1 ) · · · (1, i2 )(1, i1 ).
AF
3. So, every r-cycle can be written as product of transpositions. Furthermore, they can be
DR
written using the n transpositions (1, 2), (1, 3), . . . , (1, n).
With the above definitions, we state and prove two important results.
Proof. Note that using use Remark 9.2.14, we just need to show that f can be written as
product of disjoint cycles.
Consider the set S = {1, f (1), f (2) (1) = (f ◦ f )(1), f (3) (1) = (f ◦ (f ◦ f ))(1), . . .}. As S is an
infinite set and each f (i) (1) ∈ [n], there exist i, j with 0 ≤ i < j ≤ n such that f (i) (1) = f (j) (1).
Now, let j1 be the least positive integer such that f (i) (1) = f (j1 ) (1), for some i, 0 ≤ i < j1 .
Then, we claim that i = 0.
For if, i − 1 ≥ 0 then j1 − 1 ≥ 1 and the condition that f is one-one gives
f (i−1) (1) = (f −1 ◦ f (i) )(1) = f −1 f (i) (1) = f −1 f (j1 ) (1) = (f −1 ◦ f (j1 ) )(1) = f (j1 −1) (1).
Thus, we see that the repetition has occurred at the (j1 − 1)-th instant, contradicting the
assumption that j1 was the least such positive integer. Hence, we conclude that i = 0. Thus,
(1, f (1), f (2) (1), . . . , f (j1 −1) (1)) is one of the cycles in f .
Now, choose i1 ∈ [n] \ {1, f (1), f (2) (1), . . . , f (j1 −1) (1)} and proceed as above to get another
cycle. Let the new cycle by (i1 , f (i1 ), . . . , f (j2 −1) (i1 )). Then, using f is one-one follows that
1, f (1), f (2) (1), . . . , f (j1 −1) (1) ∩ i1 , f (i1 ), . . . , f (j2 −1) (i1 ) = ∅.
228 CHAPTER 9. APPENDIX
So, the above process needs to be repeated at most n times to get all the disjoint cycles. Thus,
the required result follows.
Remark 9.2.16. Note that when one writes a permutation as product of disjoint cycles, cycles
of length 1 are suppressed so as to match Definition 9.2.11. For example, the algorithm in the
proof of Theorem 9.2.15 implies
1. Using Remark 9.2.14.3, we see that every permutation can be written as product of the n
transpositions (1, 2), (1, 3), . . . , (1, n).
!
1 2 3 4 5
2. = (1)(2, 4, 5)(3) = (2, 4, 5).
1 4 3 5 2
!
1 2 3 4 5 6 7 8 9
3. = (1, 4, 5)(2)(3)(6, 9)(7, 8) = (1, 4, 5)(6, 9)(7, 8).
4 2 3 5 1 9 8 7 6
Note that Id3 = (1, 2)(1, 2) = (1, 2)(2, 3)(1, 2)(1, 3), as well. The question arises, is it
possible to write Idn as a product of odd number of transpositions? The next lemma answers
this question in negative.
Idn = f1 ◦ f2 ◦ · · · ◦ ft ,
then t is even.
T
Proof. We will prove the result by mathematical induction. Observe that t 6= 1 as Idn is not a
AF
transposition. Hence, t ≥ 2. If t = 2, we are done. So, let us assume that the result holds for
DR
f = g1 ◦ g2 ◦ · · · ◦ gk = h1 ◦ h2 ◦ · · · ◦ h`
Idn = g1 ◦ g2 ◦ · · · ◦ gk ◦ h` ◦ h`−1 ◦ · · · ◦ h1 .
Hence by Lemma 9.2.17, k + ` is even. Thus, either k and ` are both even or both odd.
Definition 9.2.20. Observe that if f and g are both even or both odd permutations, then f ◦ g
and g ◦ f are both even. Whereas, if one of them is odd and the other even then f ◦ g and g ◦ f
are both odd. We use this to define a function sgn : Sn → {1, −1}, called the signature of a
permutation, by (
1 if f is an even permutation
sgn(f ) = .
−1 if f is an odd permutation
Example 9.2.21. Consider the set Sn . Then,
3. using Remark 9.2.20, sgn(f ◦ g) = sgn(f ) · sgn(g) for any two permutations f, g ∈ Sn .
T
Definition 9.2.22. Let A = [aij ] be an n × n matrix with complex entries. Then, the deter-
DR
Observe that det(A) is a scalar quantity. Even though the expression for det(A) seems
complicated at first glance, it is very helpful in proving the results related with “properties of
determinant”. We will do so in the next section. As another examples, we verify that this
definition also matches for 3 × 3 matrices. So, let A = [aij ] be a 3 × 3 matrix. Then, using
Equation (9.2.2),
X 3
Y
det(A) = sgn(σ) aiσ(i)
σ∈Sn i=1
3
Y 3
Y 3
Y
= sgn(f1 ) aif1 (i) + sgn(f2 ) aif2 (i) + sgn(f3 ) aif3 (i) +
i=1 i=1 i=1
3
Y 3
Y 3
Y
sgn(f4 ) aif4 (i) + sgn(f5 ) aif5 (i) + sgn(f6 ) aif6 (i)
i=1 i=1 i=1
= a11 a22 a33 − a11 a23 a32 − a12 a21 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31 .
230 CHAPTER 9. APPENDIX
5. Let B and C be two n×n matrices. If there exists m ∈ [n] such that B[i, :] = C[i, :] = A[i, :]
for all i 6= m and C[m, :] = A[m, :] + B[m, :] then det(C) = det(A) + det(B).
7. If A is a triangular matrix then det(A) = a11 · · · ann , the product of the diagonal entries.
Proof. Part 1: Note that each sum in det(A) contains one entry from each row. So, each sum
has an entry from A[i, :] = 0T . Hence, each sum in itself is zero. Thus, det(A) = 0.
Part 2: By assumption, B[k, :] = A[k, :] for k 6= i and B[i, :] = cA[i, :]. So,
X Y X Y
det(B) = sgn(σ) bkσ(k) biσ(i) = sgn(σ) akσ(k) caiσ(i)
σ∈Sn k6=i σ∈Sn k6=i
X n
Y
= c sgn(σ) akσ(k) = c det(A).
σ∈Sn k=1
Part 3: Let τ = (i, j). Then, sgn(τ ) = −1, by Lemma 9.2.8, Sn = {σ ◦ τ : σ ∈ Sn } and
X n
Y X n
Y
det(B) = sgn(σ) biσ(i) = sgn(σ ◦ τ ) bi,(σ◦τ )(i)
σ∈Sn i=1 σ◦τ ∈Sn i=1
X Y
= sgn(τ ) · sgn(σ) bkσ(k) bi(σ◦τ )(i) bj(σ◦τ )(j)
σ◦τ ∈Sn k6=i,j
X Y X n
Y
= sgn(τ ) sgn(σ) bkσ(k) biσ(j) bjσ(i) = − sgn(σ) akσ(k)
σ∈Sn k6=i,j σ∈Sn k=1
= − det(A).
Part 4: As A[i, :] = A[j, :], A = Eij A. Hence, by Part 3, det(A) = − det(A). Thus, det(A) = 0.
9.3. PROPERTIES OF DETERMINANT 231
Part 5: By assumption, C[i, :] = B[i, :] = A[i, :] for i 6= m and C[m, :] = B[m, :] + A[m, :]. So,
X Yn X Y
det(C) = sgn(σ) ciσ(i) = sgn(σ) ciσ(i) cmσ(m)
σ∈Sn i=1 σ∈Sn i6=m
X Y
= sgn(σ) ciσ(i) (amσ(m) + bmσ(m) )
σ∈Sn i6=m
X n
Y X n
Y
= sgn(σ) aiσ(i) + sgn(σ) biσ(i) = det(A) + det(B).
σ∈Sn i=1 σ∈Sn i=1
Part 6: By assumption, B[k, :] = A[k, :] for k 6= i and B[i, :] = A[i, :] + cA[j, :]. So,
X Yn X Y
det(B) = sgn(σ) bkσ(k) = sgn(σ) bkσ(k) biσ(i)
σ∈Sn k=1 σ∈Sn k6=i
X Y
= sgn(σ) akσ(k) (aiσ(i) + cajσ(j) )
σ∈Sn k6=i
X Y X Y
= sgn(σ) akσ(k) aiσ(i) + c sgn(σ) akσ(k) ajσ(j) )
σ∈Sn k6=i σ∈Sn k6=i
X n
Y
= sgn(σ) akσ(k) + c · 0 = det(A). U seP art 4
T
σ∈Sn k=1
AF
Part 7: Observe that if σ ∈ Sn and σ 6= Idn then n(σ) ≥ 1. Thus, for every σ 6= Idn , there
DR
exists m ∈ [n] (depending on σ) such that m > σ(m) or m < σ(m). So, if A is triangular,
amσ(m) = 0. So, for each σ 6= Idn , ni=1 aiσ(i) = 0. Hence, det(A) = ni=1 aii . the result follows.
Q Q
Part 8: Using Part 7, det(In ) = 1. By definition Eij = Eij In and Ei (c) = Ei (c)In and
Eij (c) = Eij (c)In , for c 6= 0. Thus, using Parts 2, 3 and 6, we get det(Ei (c)) = c, det(Eij ) = −1
and det(Eij (k)) = 1. Also, again using Parts 2, 3 and 6, we get det(EA) = det(E) det(A).
Part 9: Suppose A is invertible. Then, by Theorem 2.7.1, A = E1 · · · Ek , for some elementary
matrices E1 , . . . , Ek . So, a repeated application of Part 8 implies det(A) = det(E1 ) · · · det(Ek ) 6=
0 as det(Ei ) 6= 0 for 1 ≤ i ≤ k.
Now, suppose that det(A) 6= 0. We need to show that A is invertible. On the contrary, as-
sume that A is not invertible. Then, by Theorem 2.7.1, Rank(A) <"n. So, # by Proposition 2.4.9,
B
there exist elementary matrices E1 , . . . , Ek such that E1 · · · Ek A = . Therefore, by Part 1
0
and a repeated application of Part 8 gives
" #!
B
det(E1 ) · · · det(Ek ) det(A) = det(E1 · · · Ek A) = det = 0.
0
In case A is not invertible, by Part 9, det(A) = 0. Also, AB is not invertible (AB is invertible
will imply A is invertible using the rank argument). So, again by Part 9, det(AB) = 0. Thus,
det(AB) = det(A) det(B).
Part 11: Let B = [bij ] = AT . Then, bij = aji , for 1 ≤ i, j ≤ n. By Lemma 9.2.8, we know that
Sn = {σ −1 : σ ∈ Sn }. As σ ◦ σ −1 = Idn , sgn(σ) = sgn(σ −1 ). Hence,
X n
Y X n
Y X n
Y
det(B) = sgn(σ) biσ(i) = sgn(σ) aσ(i),i = sgn(σ −1 ) aiσ−1 (i)
σ∈Sn i=1 σ∈Sn i=1 σ −1 ∈S n i=1
= det(A).
Remark 9.3.2. 1. As det(A) = det(AT ), we observe that in Theorem 9.3.1, the condition
on “row” can be replaced by the condition on “column”.
2. Let A = [aij ] be a matrix satisfying a1j = 0, for 2 ≤ j ≤ n. Let B = A(1 | 1), the submatrix
of A obtained by removing the first row and the first column. Then det(A) = a11 det(B).
Proof: Let σ ∈ Sn with σ(1) = 1. Then, σ has a cycle (1). So, a disjoint cycle represen-
tation of σ only has numbers {2, 3, . . . , n}. That is, we can think of σ as an element of
Sn−1 . Hence,
X n
Y X n
Y
det(A) = sgn(σ) aiσ(i) = sgn(σ) aiσ(i)
T
AF
X Y X Y
= a11 sgn(σ) aiσ(i) = a11 sgn(σ) biσ(i) = a11 det(B).
σ∈Sn ,σ(1)=1 i=2 σ∈Sn−1 i=1
We now relate this definition of determinant with the one given in Definition 2.8.1.
n
(−1)1+j a1j det A(1 | j) , where
P
Theorem 9.3.3. Let A be an n × n matrix. Then, det(A) =
j=1
recall that A(1 | j) is the submatrix of A obtained by removing the 1st row and the j th column.
0 0 · · · a1j · · · 0
a21 a22 · · · a2j · · · a2n
Proof. For 1 ≤ j ≤ n, define an n × n matrix Bj = . . Also, for
.. .. .. .. ..
. . . .
an1 an2 · · · anj · · · ann
each matrix Bj , we define the n × n matrix Cj by
1. Cj [:, 1] = Bj [:, j],
2. Cj [:, i] = Bj [:, i − 1], for 2 ≤ i ≤ j and
3. Cj [:, k] = Bj [:, k] for k ≥ j + 1.
Also, observe that Bj ’s have been defined to satisfy B1 [1, :] + · · · + Bn [1, :] = A[1, :] and
Bj [i, :] = A[i, :] for all i ≥ 2 and 1 ≤ j ≤ n. Thus, by Theorem 9.3.1.5,
n
X
det(A) = det(Bj ). (9.3.1)
j=1
9.4. DIMENSION OF W1 + W2 233
Let us now compute det(Bj ), for 1 ≤ j ≤ n. Note that Cj = E12 E23 · · · Ej−1,j Bj , for 1 ≤ j ≤ n.
Then, by Theorem 9.3.1.3, we get det(Bj ) = (−1)j−1 det(Cj ). So, using Remark 9.3.2.2 and
Theorem 9.3.1.2 and Equation (9.3.1), we have
n
X n
X
(−1)j−1 det(Cj ) = (−1)j+1 a1j det A(1 | j) .
det(A) =
j=1 j=1
Thus, we have shown that the determinant defined in Definition 2.8.1 is valid.
9.4 Dimension of W1 + W2
Theorem 9.4.1. Let V be a finite dimensional vector space over F and let W1 and W2 be two
subspaces of V. Then,
2. LS(D) = W1 + W2 .
DR
The second part can be easily verified. For the first part, consider the linear system
α1 u1 + · · · + αr ur + β1 w1 + · · · + βs ws + γ1 v1 + · · · + γt vt = 0 (9.4.2)
α1 u1 + · · · + αr ur + β1 w1 + · · · + βs ws = −(γ1 v1 + · · · + γt vt ).
s r T
γi vi ∈ LS(B1 ) = W1 . Also, v = βk wk . So, v ∈ LS(B2 ) = W2 .
P P P
Then, v = − αr ur +
i=1 j=1 k=1
r
Hence, v ∈ W1 ∩ W2 and therefore, there exists scalars δ1 , . . . , δk such that v =
P
δ j uj .
j=1
Substituting this representation of v in Equation (9.4.2), we get
α1 u1 + · · · + αk uk + γ1 v1 + · · · + γr vr = 0
which has αi = 0 for 1 ≤ i ≤ r and γj = 0 for 1 ≤ j ≤ s as the only solution. Hence, we see that
the linear system of Equations (9.4.2) has no nonzero solution. Therefore, the set D is linearly
independent and the set D is indeed a basis of W1 + W2 . We now count the vectors in the sets
B, B1 , B2 and D to get the required result.
234 CHAPTER 9. APPENDIX
Theorem 9.5.1. Let V be a real vector space. A norm k · k is induced by an inner product if
and only if, for all x, y ∈ V, the norm satisfies
Proof. Suppose that k · k is indeed induced by an inner product. Then, by Exercise 4.2.7.3 the
result follows.
So, let us assume that k · k satisfies the parallelogram law. So, we need to define an inner
product. We claim that the function f : V × V → R defined by
1
kx + yk2 − kx − yk2 , for all x, y ∈ V
f (x, y) =
4
satisfies the required conditions for an inner product. So, let us proceed to do so.
1
Step 1: Clearly, for each x ∈ V, f (x, 0) = 0 and f (x, x) = kx + xk2 = kxk2 . Thus,
4
f (x, x) ≥ 0. Further, f (x, x) = 0 if and only if x = 0.
Step 3: Now note that kx + yk2 − kx − yk2 = 2 kx + yk2 − kxk2 − kyk2 . Or equivalently,
AF
Now, substituting z = 0 in Equation (9.5.3) and using Equation (9.5.2), we get 2f (x, y) =
f (x, 2y) and hence 4f (x + z, y) = 2f (x + z, 2y) = 4 (f (x, y) + f (z, y)). Thus,
Step 4: Using Equation (9.5.4), f (x, y) = f (y, x) and the principle of mathematical induction,
it follows that nf (x, y) = f (nx, y), for all x, y ∈ V and n ∈ N. Another application of
Equation (9.5.4) with f (0, y) = 0 implies that nf (x, y) = f (nx, y), for all x, y ∈ V and
n ∈ Z. Also, for m 6= 0,
n n
mf x, y = f (m x, y) = f (nx, y) = nf (x, y).
m m
Hence, we see that for all x, y ∈ V and a ∈ Q, f (ax, y) = af (x, y).
9.6. ROOTS OF A POLYNOMIALS 235
where [a, b] ⊆ R.
DR
1. If the function f is one-one on [a, b) and also on (a, b], then it is called a simple curve.
2. If f (b) = f (a), then it is called a closed curve.
3. A closed simple curve is called a Jordan curve.
4. The derivative (integral) of a curve f = u+iv is defined component wise. If f 0 is continuous
on [a, b], we say f is a C 1 -curve (at end points we consider one sided derivatives and
continuity).
5. A C 1 -curve on [a, b] is called a smooth curve, if f 0 is never zero on (a, b).
6. A piecewise smooth curve is called a contour.
7. A positively oriented simple closed curve is called a simple closed curve such that
while traveling on it the interior of the curve always stays to the left. (Camille Jordan
has proved that such a curve always divides the plane into two connected regions, one of
which is called the bounded region and the other is called the unbounded region. The
one which is bounded is considered as the interior of the curve.)
Theorem 9.6.2. [Rouche’s Theorem] Let C be a positively oriented simple closed contour.
Also, let f and g be two analytic functions on RC , the union of the interior of C and the curve
C itself. Assume also that |f (x)| > |g(x)|, for all x ∈ C. Then, f and f + g have the same
number of zeros in the interior of C.
236 CHAPTER 9. APPENDIX
Corollary 9.6.3. [Alen Alexanderian, The University of Texas at Austin, USA.] Let P (t) =
tn +an−1 tn−1 +· · ·+a0 have distinct roots λ1 , . . . , λm with multiplicities α1 , . . . , αm , respectively.
Take any > 0 for which the balls B (λi ) are disjoint. Then, there exists a δ > 0 such that the
polynomial q(t) = tn + a0n−1 tn−1 + · · · + a00 has exactly αi roots (counting with multiplicities) in
B (λi ), whenever |aj − a0j | < δ.
Hence, by Rouche’s theorem, p(z) and q(z) have the same number of zeros inside Cj , for each
j = 1, . . . , m. That is, the zeros of q(t) are within the -neighborhood of the zeros of P (t).
As a direct application, we obtain the following corollary.
opposite to a0 ; am2 is the first after am1 with sign opposite to am1 ; and so on.
AF
maximum number of positive roots of P (x) = 0 is the number of changes in sign of the
coefficients and that the maximum number of negative roots is the number of sign changes
in P (−x) = 0.
Proof. Assume that a0 , a1 , · · · , an has k > 0 sign changes. Let b > 0. Then, the coeffi-
cients of (x − b)P (x) are
This list has at least k + 1 changes of signs. To see this, assume that a0 > 0 and an 6= 0.
Let the sign changes of ai occur at m1 < m2 < · · · < mk . Then, setting
we see that ci > 0 when i is even and ci < 0, when i is odd. That proves the claim.
Now, assume that P (x) = 0 has k positive roots b1 , b2 , · · · , bk . Then,
Proof. Proof of Part 1: By spectral theorem (see Theorem 6.4.10, there exists a unitary matrix
U such that A = U DU ∗ , where D = diag(λ1 (A), . . . , λn (A)) is a real diagonal matrix. Thus,
the set {U [:, 1], . . . , U [:, n]} is a basis of Cn . Hence, for each x ∈ Cn , there exists Ans :i ’s
(scalar) such that x = αi U [:, i]. So, note that x∗ x = |αi |2 and
P
X X X
λ1 (A)x∗ x = λ1 (A) |αi |2 ≤ |αi |2 λi (A) = x∗ Ax ≤ λn |αi |2 = λn x∗ x.
For Part 2 and Part 3, take x = U [:, 1] and x = U (:, n), respectively.
As an immediate corollary, we state the following result.
T
x∗ Ax
AF
Proof. Let x ∈ Cn such that x is orthogonal to U [, 1], . . . , U [:, k − 1]. Then, we can write
Pn
x= αi U [:, i], for some scalars αi ’s. In that case,
i=k
n
X n
X
λk x∗ x = λk |αi |2 ≤ |αi |2 λi = x∗ Ax
i=k i=k
and the equality occurs for x = U [:, k]. Thus, the required result follows.
Hence, λk ≥ max min x∗ Ax, for each choice of k − 1 linearly independent vectors.
w1 ,...,wk−1 kxk=1
x⊥w1 ,...,wk−1
But, by Proposition 9.7.3, the equality holds for the linearly independent set {U [:, 1], . . . , U [:
, k − 1]} which proves the first equality. A similar argument gives the second equality and hence
the proof is omitted.
Proof. As A and B are Hermitian matrices, the matrix A + B is also Hermitian. Hence, by
Courant-Fischer theorem and Lemma 9.7.1.1,
w1 ,...,wk−1 kxk=1
x⊥w1 ,...,wk−1
DR
and
= max min x∗ Ax
w1 ,...,wk−1 kxk=1
x⊥w1 ,...,wk−1 ,z
Theorem 9.7.6.
" [Cauchy
# Interlacing Theorem] Let A ∈ Mn (C) be a Hermitian matrix.
A y
Define  = ∗ , for some a ∈ R and y ∈ Cn . Then,
y a
and
λk+n−r (A).
DR
Theorem 9.7.8. [Poincare Separation Theorem] Let A ∈ Mn (C) be a Hermitian matrix and
{u1 , . . . , ur } ⊆ Cn be an orthonormal set for some positive integer r, 1 ≤ r ≤ n. If further
B = [bij ] is an r × r matrix with bij = u∗i Auj , 1 ≤ i, j ≤ r then, λk (A) ≤ λk (B) ≤ λk+n−r (A).
Proof. Let us extend the i set {u1 , . . . , ur } to an orthonormal basis, say {u1 , . . . , un }
h orthonormal
of Cn and write U = u1 · · · un . Then, B is a r × r principal submatrix of U ∗ AU . Thus, by
inclusion principle, λk (U ∗ AU ) ≤ λk (B) ≤ λk+n−r (U ∗ AU ). But, we know that σ(U ∗ AU ) = σ(A)
and hence the required result follows.
The proof of the next result is left for the reader.
Corollary 9.7.9. Let A ∈ Mn (C) be a Hermitian matrix and r be a positive integer with
1 ≤ r ≤ n. Then,
Now assume that x∗ Ax > 0 holds for each nonzero x ∈ W and that λn−k+1 = 0. Then, it
follows that min x∗ Ax = 0. Now, define f : Cn → C by f (x) = x∗ Ax.
kxk=1
x⊥x1 ,...,xn−k
Then, f is a continuous function and min f (x) = 0. Thus, f must attain its bound on the
kxk=1
x∈W
unit sphere. That is, there exists y ∈ W with kyk = 1 such that y∗ Ay = 0, a contradiction.
Thus, the required result follows.
T
AF
DR
Index
241
242 INDEX
Cofactor, 57 Row, 9
Column, 9 Row Echelon Form, 30
Commuting Family, 218 Row Equivalence, 34
Companion, 197 Row-Reduced Echelon Form, 38
Defective, 166 Scalar, 10
Determinant, 55 Scalar Multiplication, 11
Diagonal, 10 Singular, 56
Diagonalization, 164 Size, 9
Eigen-pair, 156 Skew-Hermitian, 19
Eigenvalue, 156 Skew-Symmetric, 19
Eigenvector, 156 Spectral Radius, 156
Elementary, 33 Spectrum, 156
Generalized Inverse, 24 Square, 10
Hermitian, 19 Submatrix, 21
Householder, 24, 213 Symmetric, 19
Idempotent, 20 Toeplitz, 203
Identity, 10 Trace, 23
Inverse, 17 triangular, 10
Jordan, 190 Unitary, 19
T
Principal Submatrix, 21
AF
Mm,n (C), 19
Properties of Determinant, 230
Mn,1 (C), 11
DR
Vector
Column, 9
Coordinate, 132
Row, 9
Unit, 19
Vector Space, 66
Basis, 85
Complex, 66
Complex n-tuple, 67
T
Dimension of M + N , 233
AF
Finite Dimensional, 73
Infinite Dimensional, 73
Inner Product, 97
Isomorphic, 130
Minimal spanning set, 85
Real, 66
Real n-tuple, 67
Subspace, 69
Vector Subspace, 69
Vectors
Angle, 100
Length, 99
Linear Combination, 72
Linear Dependence, 77
Linear Independence, 77
Linear Span, 73
Mutually Orthogonal, 105
Norm, 99
Orthogonal, 103
Orthonormal, 105