MATH 233 - Linear Algebra I Lecture Notes: Cesar O. Aguilar
MATH 233 - Linear Algebra I Lecture Notes: Cesar O. Aguilar
Lecture Notes
Cesar O. Aguilar
Department of Mathematics
SUNY Geneseo
Lecture 0
Contents
3 Vector Equations 19
3.1 Vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 The linear combination problem . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 The span of a set of vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6 Linear Independence 49
6.1 Linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.2 The maximum size of a linearly independent set . . . . . . . . . . . . . . . . 53
3
CONTENTS
9 Matrix Algebra 75
9.1 Sums of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
9.2 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
9.3 Matrix Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
10 Invertible Matrices 83
10.1 Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
10.2 Computing the Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . 85
10.3 Invertible Linear Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
11 Determinants 89
11.1 Determinants of 2 × 2 and 3 × 3 Matrices . . . . . . . . . . . . . . . . . . . . 89
11.2 Determinants of n × n Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 93
11.3 Triangular Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4
Lecture 0
23 Diagonalization 179
23.1 Eigenvalues of Triangular Matrices . . . . . . . . . . . . . . . . . . . . . . . 179
23.2 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
23.3 Conditions for Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . 182
5
Lecture 1
Lecture 1
In this lecture, we will introduce linear systems and the method of row reduction to solve
them. We will introduce matrices as a convenient structure to represent and solve linear
systems. Lastly, we will discuss geometric interpretations of the solution set of a linear
system in 2- and 3-dimensions.
The numbers aij are called the coefficients of the linear system; because there are m equa-
tions and n unknown variables there are thefore m × n coefficients. The main problem with
a linear system is of course to solve it:
Problem: Find a list of n numbers (s1 , s2 , . . . , sn ) that satisfy the system of linear equa-
tions (1.1).
In other words, if we substitute the list of numbers (s1 , s2 , . . . , sn ) for the unknown
variables (x1 , x2 , . . . , xn ) in equation (1.1) then the left-hand side of the ith equation will
equal bi . We call such a list (s1 , s2 , . . . , sn ) a solution to the system of equations. Notice
that we say “a solution” because there may be more than one. The set of all solutions to a
linear system is called its solution set. As an example of a linear system, below is a linear
1
Systems of Linear Equations
x1 − 5x2 − 7x3 = 0
5x2 + 11x3 = 1
−5x1 + x2 = −1
πx1 − 5x2 = 0
√
63x1 − 2x2 = −7
Example 1.2. Verify that (1, 2, −4) is a solution to the system of equations
2x1 + 2x2 + x3 = 2
x1 + 3x2 − x3 = 11.
but
1 + 3(−1) − 2 = −4 6= 11.
Thus, (1, −1, 2) is not a solution to the system.
A linear system may not have a solution at all. If this is the case, we say that the linear
system is inconsistent:
2
Lecture 1
INCONSISTENT ⇔ NO SOLUTION
We will see shortly that a consistent linear system will have either just one solution or
infinitely many solutions. For example, a linear system cannot have just 4 or 5 solutions. If
it has multiple solutions, then it will have infinitely many solutions.
Example 1.3. Show that the linear system does not have a solution.
−x1 + x2 = 3
x1 − x2 = 1.
0=4
which is a contradiction. Therefore, there does not exist a list (s1 , s2 ) that satisfies the
system because this would lead to the contradiction 0 = 4.
s1 = − 32 − 2t
s2 = 23 + t
s3 = t.
Show that for any choice of the parameter t, the list (s1 , s2 , s3 ) is a solution to the linear
system
x1 + x2 + x3 = 0
x1 + 3x2 − x3 = 3.
Solution. Substitute the list (s1 , s2 , s3 ) into the left-hand-side of the first equation
− 32 − 2t + 32 + t + t = 0
Both equations are satisfied for any value of t. Because we can vary t arbitrarily, we get an
infinite number of solutions parameterized by t. For example, compute the list (s1 , s2 , s3 )
for t = 3 and confirm that the resulting list is a solution to the linear system.
3
Systems of Linear Equations
1.2 Matrices
We will use matrices to develop systematic methods to solve linear systems and to study
the properties of the solution set of a linear system. Informally speaking, a matrix is an
array or table consisting of rows and columns. For example,
1 −2 1 0
A= 0 2 −8 8
−4 7 11 −5
is a matrix having m = 3 rows and n = 4 columns. In general, a matrix with m rows and
n columns is a m × n matrix and the set of all such matrices will be denoted by Mm×n .
Hence, A above is a 3 × 4 matrix. The entry of A in the ith row and jth column will be
denoted by aij . A matrix containing only one column is called a column vector and a
matrix containing only one row is called a row vector. For example, here is a row vector
u = 1 −3 4
and here is a column vector
3
v= .
−1
We can associate to a linear system three matrices: (1) the coefficient matrix, (2) the
output column vector, and (3) the augmented matrix. For example, for the linear system
5x1 − 3x2 + 8x3 = −1
x1 + 4x2 − 6x3 = 0
2x2 + 4x3 = 3
the coefficient matrix A, the output vector b, and the augmented matrix [A b] are:
5 −3 8 −1 5 −3 8 −1
A= 1 4 −6 , b = 0 , [A b] = 1 4 −6 0 .
0 2 4 3 0 2 4 3
If a linear system has m equations and n unknowns then the coefficient matrix A must be a
m × n matrix, that is, A has m rows and n columns. Using our previously defined notation,
we can write this as A ∈ Mm×n .
If we are given an augmented matrix, we can write down the associated linear system in
an obvious way. For example, the linear system associated to the augmented matrix
1 4 −2 8 12
0 1 −7 2 −4
0 0 5 −1 7
is
x1 + 4x2 − 2x3 + 8x4 = 12
x2 − 7x3 + 2x4 = −4
5x3 − x4 = 7.
4
Lecture 1
We can study matrices without interpreting them as coefficient matrices or augmented ma-
trices associated to a linear system. Matrix algebra is a fascinating subject with numerous
applications in every branch of engineering, medicine, statistics, mathematics, finance, biol-
ogy, chemistry, etc.
These operations do not alter the solution set. The idea is to apply these operations itera-
tively to simplify the linear system to a point where one can easily write down the solution
set. It is convenient to apply elementary operations on the augmented matrix [A b] repre-
senting the linear system. In this case, we call the operations elementary row operations,
and the process of simplifying the linear system using these operations is called row reduc-
tion. The goal with row reducing is to transform the original linear system into one having
a triangular structure and then perform back substitution to solve the system. This is
best explained via an example.
5
Systems of Linear Equations
Example 1.6. Solve the linear system using elementary row operations.
Solution. Our goal is to perform elementary row operations to obtain a triangular structure
and then use back substitution to solve. The augmented matrix is
−3 2 4 12
1 0 −2 −4 .
2 −3 4 −3
As you will see, this first operation will simplify the next step. Add 3R1 to R2 :
1 0 −2 −4 1 0 −2 −4
3R +R2
−3 2 4 12 −−1−−→ 0 2 −2 0
2 −3 4 −3 2 −3 4 −3
Add −2R1 to R3 :
1 0 −2 −4 1 0 −2 −4
−2R1 +R3
0 2 −2 0 − −−−−→ 0 2 −2 0
2 −3 4 −3 0 −3 8 5
Multiply R2 by 12 :
1 0 −2 −4 1
R2
1 0 −2 −4
0 2 −2 0 −2−→ 0 1 −1 0
0 −3 8 5 0 −3 8 5
Add 3R2 to R3 :
1 0 −2 −4 1 0 −2 −4
3R2 +R3
0 1 −1 0 − −−−→ 0 1 −1 0
0 −3 8 5 0 0 5 5
Multiply R3 by 15 :
1 0 −2 −4 1
R3
1 0 −2 −4
0 1 −1 0 −5−→ 0 1 −1 0
0 0 5 5 0 0 1 1
We can continue row reducing but the row reduced augmented matrix is in triangular form.
So now use back substitution to solve. The linear system associated to the row reduced
6
Lecture 1
augmented matrix is
x1 − 2x3 = −4
x2 − x3 = 0
x3 = 1
The last equation gives that x3 = 1. From the second equation we obtain that x2 − x3 = 0,
and thus x2 = 1. The first equation then gives that x1 = −4 + 2(1) = −2. Thus, the solution
to the original system is (−2, 1, 1). You should verify that (−2, 1, 1) is a solution to the
original system.
Example 1.7. Using elementary row operations, show that the linear system is inconsistent.
x1 + 2x3 = 1
x2 + x3 = 0
2x1 + 4x3 = 1
Solution. The augmented matrix is
1 0 2 1
0 1 1 0
2 0 4 1
Perform the operation −2R1 + R3 :
1 0 2 1 1 0 2 1
−2R1 +R3
0 1 1 0 − −−−−→ 0 1 1 0
2 0 4 1 0 0 0 −1
The last row of the simplified augmented matrix
1 0 2 1
0 1 1 0
0 0 0 −1
7
Systems of Linear Equations
where c is a nonzero number, then the linear system is inconsistent. We will call this type
of row an inconsistent row. However, a row of the form
0 1 0 0 0
x1 − 2x2 = −1
(1.2)
−x1 + 3x2 = 3
is the intersection of the two lines determined by the equations of the system. The solution
for this system is (3, 2). The two lines intersect at the point (x1 , x2 ) = (3, 2), see Figure 1.1.
Figure 1.1: The intersection point of the two lines is the solution of the linear system (1.2)
8
Lecture 1
is the intersection of the three planes determined by the equations of the system. In this case,
there is only one solution: (29, 16, 3). In the case of a consistent system of two equations,
the solution set is the line of intersection of the two planes determined by the equations of
the system, see Figure 1.2.
x1 − 2x2 + x3 = 0
−4x1 + 5x2 + 9x3 = −9
Figure 1.2: The intersection of the two planes is the solution set of the linear system (1.3)
9
Systems of Linear Equations
10
Lecture 2
Lecture 2
In this lecture, we will get more practice with row reduction and in the process introduce
two important types of matrix forms. We will also discuss when a linear system has a unique
solution, infinitely many solutions, or no solution. Lastly, we will introduce a convenient
parameter called the rank of a matrix.
P1. All nonzero rows are above any rows of all zeros.
P2. The leftmost nonzero entry of a row is to the right of the leftmost nonzero entry of
the row above it.
11
Row Reduction and Echelon Forms
Any matrix satisfying properties P1 and P2 is said to be in row echelon form (REF). In
REF, the leftmost nonzero entry in a row is called a leading entry:
1 5 0 −2 −1 7 −4
0 2 −2 0 0 3 0
0 0 0 −9 −1 1 −1
0 0 0 0 5 1 5
0 0 0 0 0 0 0
A consequence of property P2 is that every entry below a leading entry is zero:
1 5 0 −2 −4 −1 −7
0 2 −2 0 0 3 0
0 0 0 −9 −1 1 −1
0 0 0 0 5 1 5
0 0 0 0 0 0 0
We can perform elementary row operations, or row reduction, to transform a matrix into
REF.
Example 2.1. Explain why the following matrices are not in REF. Use elementary row
operations to put them in REF.
3 −1 0 3 7 5 0 −3
M = 0 0 0 0 N = 0 3 −1 1
0 1 3 0 0 6 −5 2
Solution. Matrix M fails property P1. To put M in REF we interchange R2 with R3 :
3 −1 0 3 3 −1 0 3
R ↔R3
M = 0 0 0 0 −−2−−→ 0 1 3 0
0 1 3 0 0 0 0 0
The matrix N fails property P2. To put N in REF we perform the operation −2R2 + R3 →
R3 :
7 5 0 −3 7 5 0 −3
−2R2 +R3
0 3 −1 1 − −−−−→ 0 3 −1 1
0 6 −5 2 0 0 −3 0
Why is REF useful? Certain properties of a matrix can be easily deduced if it is in REF.
For now, REF is useful to us for solving a linear system of equations. If an augmented matrix
is in REF, we can use back substitution to solve the system, just as we did in Lecture 1.
For example, consider the system
8x1 − 2x2 + x3 = 4
3x2 − x3 = 7
2x3 = 4
12
Lecture 2
P4. All the entries above (and below) a leading 1 are all zero.
A leading 1 in the RREF of a matrix is called a pivot. For example, the following matrix
in RREF:
1 6 0 3 0 0
0 0 1 −4 0 5
0 0 0 0 1 7
has three pivots:
1 6 0 3 0 0
0 0 1 −4 0 5
0 0 0 0 1 7
Example 2.2. Use row reduction to transform the matrix into RREF.
0 3 −6 6 4 −5
3 −7 8 −5 8 9
3 −9 12 −9 6 15
Solution. The first step is to make the top leftmost entry nonzero:
0 3 −6 6 4 −5 3 −9 12 −9 6 15
R3 ↔R1
3 −7 8 −5 8 9 − −−−→ 3 −7 8 −5 8 9
3 −9 12 −9 6 15 0 3 −6 6 4 −5
Now create a leading 1 in the first row:
3 −9 12 −9 6 15 1
R1
1 −3 4 −3 2 5
3 −7 8 −5 8 9 −3−→ 3 −7 8 −5 8 9
0 3 −6 6 4 −5 0 3 −6 6 4 −5
13
Row Reduction and Echelon Forms
We have now completed the top-to-bottom phase of the row reduction algorithm. In the
next phase, we work bottom-to-top and create zeros above the leading 1’s. Create zeros
above the leading 1 in the third row:
1 −3 4 −3 2 5 1 −3 4 −3 2 5
−R3 +R2
0 1 −2 2 1 −3 − −−−−→ 0 1 −2 2 0 −7
0 0 0 0 1 4 0 0 0 0 1 4
1 −3 4 −3 2 5 1 −3 4 −3 0 −3
−2R3 +R1
0 1 −2 2 0 −7 − −−−−→ 0 1 −2 2 0 −7
0 0 0 0 1 4 0 0 0 0 1 4
Create zeros above the leading 1 in the second row:
1 −3 4 −3 0 −3 1 0 −2 3 0 −24
3R +R1
0 1 −2 2 0 −7 −−2−−→ 0 1 −2 2 0 −7
0 0 0 0 1 4 0 0 0 0 1 4
This completes the row reduction algorithm and the matrix is in RREF.
14
Lecture 2
x3 = 4.
x1 + 2x2 + 3x3 = 4
x1 + 2x2 = −8.
We now must choose one of the variables x1 or x2 to be a parameter, say t, and solve for the
remaining variable. If we set x2 = t then from x1 + 2x2 = −8 we obtain that
x1 = −8 − 2t.
We can therefore write the solution set for the linear system as
x1 = −8 − 2t
x2 = t (2.1)
x3 = 4
where t can be any real number. If we had chosen x1 to be the parameter, say x1 = t,
then the solution set can be written as
x1 = t
x2 = −4 − 12 t (2.2)
x3 = 4
Although (2.1) and (2.2) are two different parameterizations, they both give the same solution
set.
15
Row Reduction and Echelon Forms
In general, if a linear system has n unknown variables and the row reduced augmented
matrix has r leading entries, then the number of free parameters d in the solution set is
d = n − r.
Thus, when performing back substitution, we will have to set d of the unknown variables
to arbitrary parameters. In the previous example, there are n = 3 unknown variables and
the row reduced augmented matrix contained r = 2 leading entries. The number of free
parameters was therefore
d = n − r = 3 − 2 = 1.
Because the number of leading entries r in the row reduced coefficient matrix determine the
number of free parameters, we will refer to r as the rank of the coefficient matrix:
r = rank(A).
Example 2.4. Solve the linear system represented by the augmented matrix
1 −7 2 −5 8 10
0 1 −3 3 1 −5
0 0 0 1 −1 4
Solution. The number of unknowns is n = 5 and the augmented matrix has rank r = 3
(leading entries). Thus, the solution set is parameterized by d = 5 − 3 = 2 free variables,
call them t and s. The last equation of the augmented matrix is x4 − x5 = 4. We choose x5
to be the first parameter so we set x5 = t. Therefore, x4 = 4 + t. The second equation of
the augmented matrix is
x2 − 3x3 + 3x4 + x5 = −5
and the unassigned variables are x2 and x3 . We choose x3 to be the second parameter, say
x3 = s. Then
x2 = −5 + 3x3 − 3x4 − x5
= −5 + 3s − 3(4 + t) − t
= −17 − 4t + 3s.
We now use the first equation of the augmented matrix to write x1 in terms of the other
variables:
16
Lecture 2
where t and s are arbitrary real numbers.. Choose arbitrary numbers for t and s and
substitute the corresponding list (x1 , x2 , . . . , x5 ) into the system of equations to verify that
it is a solution.
Theorem 2.5: Let [A b] be the augmented matrix of a linear system. One of the following
distinct possibilities will occur:
2. All the rows of the augmented matrix are consistent and there are no free parameters.
3. All the rows of the augmented matrix are consistent and there are d ≥ 1 variables
that must be set to arbitrary parameters
In Case 1., the linear system is inconsistent and thus has no solution. In Case 2., the linear
system is consistent and has only one (and thus unique) solution. This case occurs when
r = rank(A) = n since then the number of free parameters is d = n − r = 0. In Case 3., the
linear system is consistent and has infinitely many solutions. This case occurs when r < n
and thus d = n − r > 0 is the number of free parameters.
17
Row Reduction and Echelon Forms
18
Lecture 3
Lecture 3
Vector Equations
In this lecture, we introduce vectors and vector equations. Specifically, we introduce the
linear combination problem which simply asks whether it is possible to express one vector
in terms of other vectors; we will be more precise in what follows. As we will see, solving
the linear combination problem reduces to solving a linear system of equations.
3.1 Vectors in Rn
Recall that a column vector in Rn is a n × 1 matrix. From now on, we will drop the
“column” descriptor and simply use the word vectors. It is important to emphasize that a
vector in Rn is simply a list of n numbers; you are safe (and highly encouraged!) to forget
the idea that a vector is an object with an arrow. Here is a vector in R2 :
3
v= .
−1
Here is a vector in R3 :
−3
v = 0 .
11
Here is a vector in R6 :
9
0
−3
6 .
v=
0
3
To indicate that v is a vector in Rn , we will use the notation v ∈ Rn . The mathematical
symbol ∈ means “is an element of”. When we write vectors within a paragraph, we willwrite
−1
them using list notation instead of column notation, e.g., v = (−1, 4) instead of v = .
4
19
Vector Equations
We can add/subtract vectors, and multiply vectors by numbers or scalars. For example,
here is the addition of two vectors:
0 4 4
−5 −3 −8
+ = .
9 0 9
2 1 3
And the multiplication of a scalar with a vector:
1 3
3 −3 = −9 .
5 15
And here are both operations combined:
4 −2 −8 −6 −14
−2 −8 + 3 9 = 16 + 27 = 43 .
3 4 −6 12 6
These operations constitute “the algebra” of vectors. As the following example illustrates,
vectors can be used in a natural way to represent the solution of a linear system.
Example 3.1. Write the general solution in vector form of the linear system represented
by the augmented matrix
1 −7 2 −5 8 10
A b = 0 1 −3 3 1 −5
0 0 0 1 −1 4
Solution. The number of unknowns is n = 5 and the associated coefficient matrix A has
rank r = 3. Thus, the solution set is parametrized by d = n − r = 2 parameters. This
system was considered in Example 2.4 and the general solution was found to be
where t1 and t2 are arbitrary real numbers. The solution in vector form therefore takes the
form
x1 −89 − 31t1 + 19t2 −89 −31 19
x2 −17 − 4t1 + 3t2 −17 −4 3
x3 = t2
x= = 0 + t1 0 + t2 1
x4 4 + t1 4 1 0
x5 t1 0 1 0
20
Lecture 3
Here the unknowns are the scalars x1 and x2 . After some guess and check, we find that
x1 = −2 and x2 = 3 is a solution to the problem since
4 −2 −14
−2 −8 + 3 9 = 43 .
3 4 6
In some sense, the vector b is a combination of the vectors v1 and v2 . This motivates the
following definition.
The scalars in a linear combination are called the coefficients of the linear combination.
As an example, given the vectors
1 −2 −1 −3
v1 = −2 , v2 = 4 , v3 = 5 , b = 0
3 −6 6 −27
21
Vector Equations
x1 v1 + x2 v2 + x3 v3 = b? (3.1)
For obvious reasons, equation (3.1) is called a vector equation and the unknowns are x1 ,
x2 , and x3 . To gain some intuition with the linear combination problem, let’s do an example
by inspection.
Example 3.3. Let v1 = (1, 0, 0), let v2 = (0, 0, 1), let b1 = (0, 2, 0), and let b2 = (−3, 0, 7).
Are b1 and b2 linear combinations of v1 , v2 ?
Solution. For any scalars x1 and x2
x1 0 x1 0
x1 v1 + x2 v2 = 0 + 0 = 0 6= 2
0 x2 x2 0
and thus no, b1 is not a linear combination of v1 , v2 , v3 . On the other hand, by inspection
we have that
−3 0 −3
−3v1 + 7v2 = 0 + 0 = 0 = b2
0 7 7
and thus yes, b2 is a linear combination of v1 , v2 , v3 . These examples, of low dimension,
were more-or-less obvious. Going forward, we are going to need a systematic way to solve
the linear combination problem that does not rely on pure inspection.
We now describe how the linear combination problem is connected to the problem of
solving a system of linear equations. Consider again the vectors
1 1 2 0
v1 = 2 , v2 = 1 , v3 = 1 , b = 1 .
1 0 2 −2
Does there exist scalars x1 , x2 , x3 such that
x1 v1 + x2 v2 + x3 v3 = b? (3.2)
22
Lecture 3
x1 = 0, x2 = 2, x3 = −1
and thus these coefficients solve the linear combination problem. In other words,
0v1 + 2v2 − v3 = b
In this case, there is only one solution to the linear system, so b can be written as a
linear combination of v1 , v2 , . . . , vp in only one (or unique) way. You should verify these
computations.
We summarize the previous discussion with the following:
23
Vector Equations
Applying the existence and uniqueness Theorem 2.5, the only three possibilities to the linear
combination problem are:
2. If the linear system is consistent and the solution is unique then b can be written as a
linear combination of v1 , v2 , . . . , vp in only one way.
3. If the the linear system is consistent and the solution set has free parameters, then b
can be written as a linear combination of v1 , v2 , . . . , vp in infinitely many ways.
Example 3.4. Is the vector b = (7, 4, −3) a linear combination of the vectors
1 2
v1 = −2 , v2 = 5?
−5 6
24
Lecture 3
Example 3.6. Is the vector b = (8, 8, 12) a linear combination of the vectors
2 4 6
v1 = 1 , v2 = 2 , v3 = 4?
3 6 9
Solution. The augmented matrix is
2 4 6 8 1 2 3 4
REF
1 2 4 8 −−→ 0 0 1 4 .
3 6 9 12 0 0 0 0
The system is consistent and therefore b is a linear combination of v1 , v2 , v3 . In this case,
the solution set contains d = 1 free parameters and therefore, it is possible to write b as a
linear combination of v1 , v2 , v3 in infinitely many ways. In terms of the parameter t, the
solution set is
x1 = −8 − 2t
x2 = t
x3 = 4
Choosing any t gives scalars that can be used to write b as a linear combination of v1 , v2 , v3 .
For example, choosing t = 1 we obtain x1 = −10, x2 = 1, and x3 = 4, and you can verify
that
2 4 6 8
−10v1 + v2 + 4v3 = −10 1 + 2 + 4 4 = 8 = b
3 6 9 12
Or, choosing t = −2 we obtain x1 = −4, x2 = −2, and x3 = 4, and you can verify that
2 4 6 8
−4v1 − 2v2 + 4v3 = −4 1 − 2 2 + 4 4 = 8 = b
3 6 9 12
25
Vector Equations
Thus, by construction, the vector b = (8, 8, 12) is a linear combination of {v1 , v2 , v3 }. This
discussion leads us to the following definition.
Definition 3.7: Let v1 , v2 , . . . , vp be vectors. The set of all vectors that are a linear
combination of v1 , v2 , . . . , vp is called the span of v1 , v2 , . . . , vp , and we denote it by
S = span{v1 , v2 , . . . , vp }.
26
Lecture 3
In R2 , the span of two vectors v1 , v2 ∈ R2 that are not multiples of each other is all of
R2 . That is, span{v1 , v2 } = R2 . For example, with v1 = (1, 0) and v2 = (0, 1), it is true
that span{v1 , v2 } = R2 . In R3 , the span of two vectors v1 , v2 ∈ R3 that are not multiples
of each other is a plane through the origin containing v1 and v2 , see Figure 3.2. In R3 , the
3
span{v,w}
z 0
−1
−2
−3
−4
−4
−3
−2
−1 −4
−3
0 −2
y 1 −1
0
2 1 x
3 2
3
4
Figure 3.2: The span of two vectors, not multiples of each other, in R3 .
span of a single vector is a line through the origin, and the span of three vectors that do not
depend on each other (we will make this precise soon) is all of R3 .
Example 3.8. Is the vector b = (7, 4, −3) in the span of the vectors v1 = (1, −2, −5), v2 =
(2, 5, 6)? In other words, is b ∈ span{v1 , v2 }?
27
Vector Equations
Solution. By definition, b is in the span of v1 and v2 if there exists scalars x1 and x2 such
that
x1 v1 + x2 v2 = b,
that is, if b can be written as a linear combination of v1 and v2 . From our previous
discussion
on the linear combination problem, we must consider the augmented matrix v1 v2 b .
Using row reduction, the augmented matrix is consistent and there is only one solution (see
Example 3.4). Therefore, yes, b ∈ span{v1 , v2 } and the linear combination is unique.
Example 3.9. Is the vector b = (1, 0, 1) in the span of the vectors v1 = (1, 0, 2), v2 =
(0, 1, 0), v3 = (2, 1, 4)?
Example 3.10. Is the vector b = (8, 8, 12) in the span of the vectors v1 = (2, 1, 3), v2 =
(4, 2, 6), v3 = (6, 4, 9)?
The system is consistent and therefore b ∈ span{v1 , v2 , v3 }. In this case, the solution set
contains d = 1 free parameters and therefore, it is possible to write b as a linear combination
of v1 , v2 , v3 in infinitely many ways.
Example 3.11. Answer the following with True or False, and explain your answer.
(a) The vector b = (1, 2, 3) is in the span of the set of vectors
−1 2 4
3 , −7 , −5 .
0 0 0
(b) The solution set of the linear system whose augmented matrix is v1 v2 v3 b is the
same as the solution set of the vector equation
x1 v1 + x2 v2 + x3 v3 = b.
(c) Suppose that the augmented matrix v1 v2 v3 b has an inconsistent row. Then
either b can be written as a linear combination of v1 , v2 , v3 or b ∈ span{v1 , v2 , v3 }.
(d) The span of the vectors {v1 , v2 , v3 } (at least one of which is nonzero) contains only the
vectors v1 , v2 , v3 and the zero vector 0.
28
Lecture 3
29
Vector Equations
30
Lecture 4
Lecture 4
In this lecture, we introduce the operation of matrix-vector multiplication and how it relates
to the linear combination problem.
For the product Ax to be well-defined, the number of columns of A must equal the number
of components of x. Another way of saying this is that the outer dimension of A must equal
the inner dimension of x:
(m × n) · (n × 1) → m × 1
31
The Matrix Equation Ax = b
(a)
2
−4
A = 1 −1 3 0 , x=
−3
(b)
1
3 3 −2
A= , x= 0
4 −4 −1
−1
(c)
−1 1 0
4 −1
1 −2
3 −3 3 ,
A= x= 2
−2
0 −2 −3
Solution. We compute:
(a)
2
−4
Ax = 1 −1 3 0 −3
8
= (1)(2) + (−1)(−4) + (3)(−3) + (0)(8) = −3
(b)
1
3 3 −2
Ax = 0
4 −4 −1
−1
(3)(1) + (3)(0) + (−2)(−1)
=
(4)(1) + (−4)(0) + (−1)(−1)
5
=
5
32
Lecture 4
(c)
−1 1 0
4 −1
1 −2
Ax =
2
3 −3 3
−2
0 −2 −3
(−1)(−1) + (1)(2) + (0)(−2)
(4)(−1) + (1)(2) + (−2)(−2)
=
(3)(−1) + (−3)(2) + (3)(−2)
(0)(−1) + (−2)(2) + (−3)(−2)
3
2
=−15
A(u + v) = Au + Av.
A(cu) = c(Au).
Example 4.4. For the given data, verify that the properties of Theorem 4.3 hold:
3 −3 −1 2
A= , u= , v= , c = −2.
2 1 3 −1
33
The Matrix Equation Ax = b
Ax = x1 v1 + x2 v2 + · · · + xn vn .
In summary, the vector Ax is a linear combination of the columns of A where the scalar
in the linear combination are the components of x! This (important) observation gives an
alternative way to compute Ax.
Equation (⋆) is a matrix equation where the unknown variable is x. If u is a vector such
that Au = b, then we say that u is a solution to the equation Ax = b. For example,
34
Lecture 4
suppose that
1 0 −3
A= , b= .
1 0 7
x
Does the equation Ax = b have a solution? Well, for any x = 1 we have that
x2
1 0 x1 x
Ax = = 1
1 0 x2 x1
and thus any output vector Ax has equal entries. Since b does not have equal entries then
the equation Ax = b has no solution.
We now describe a systematic way to solve matrix equations. As we have seen, the vector
Ax is a linear combination of the columns of A with the coefficients given by the components
of x. Therefore, the matrix equation problem is equivalent to the linear combination problem.
In Lecture 2, we showed that the linear combination problem can be solved by solving
a
system of linear equations. Putting all this together then, if A = v1 v2 · · · vn and
b ∈ Rm then:
Ax = b
Theorem 4.6: Let A ∈ Mm×n and b ∈ Rm . The following statements are equivalent:
(a) The equation Ax = b has a solution.
(b) The vector b is a linear combination of the columns of A.
(c) The linear system represented by the augmented matrix A b is consistent.
35
The Matrix Equation Ax = b
36
Lecture 4
The last row is inconsistent and therefore there is no solution to the matrix equation Ax = b.
In other words, b is not a linear combination of the columns of A.
Solution. First note that the unknown vector x is in R3 because A has n = 3 columns. The
linear system Ax = b has m = 2 equations and n = 3 unknowns. The coefficient matrix A
has rank r = 2, and therefore
the solution set will contain d = n − r = 1 parameter. The
augmented matrix A b is
1 −1 2 2
A b = .
0 3 6 −1
Let x3 = t be the parameter and use the last row to solve for x2 :
x2 = − 13 − 2t
x1 = 2 + x2 − 2x3 = 2 + (− 31 − 2t) − 2t = 5
3
− 4t.
x1 = 53 − 4t
x2 = − 13 − 2t
x3 = t
37
The Matrix Equation Ax = b
Recall from Definition 3.7 that the span of a set of vectors v1 , v2 , . . . , vp , which we denoted
by span{v1 , v2 , . . . , vp }, is the space of vectors that can be written as a linear combination
of the vectors v1 , v2 , . . . , vp .
Example 4.10. Is the vector b in the span of the vectors v1 , v2 ?
0 3 −5
b = 4 , v1 = −2 , v2 = 6
4 1 1
x1 v1 + x2 v2 = b.
2.5v1 + 1.5v2 = b
38
Lecture 4
which does indeed have r = 3 leading entries. Therefore, regardless of the choice of b ∈ R3 ,
the augmented matrix [A b] will be consistent. Therefore, the vectors v1 , v2 , v3 span R3 :
span{v1 , v2 , v3 } = R3 .
39
The Matrix Equation Ax = b
40
Lecture 5
Lecture 5
A homogeneous system Ax = 0 always has at least one solution, namely, the zero solution
because A0 = 0. A homogeneous system is therefore always consistent. The zero solution
x = 0 is called the trivial solution and any non-zero solution is called a nontrivial
solution. From the existence and uniqueness theorem (Theorem 2.5), we know that a
consistent linear system will have either one solution or infinitely many solutions. Therefore,
a homogeneous linear system has nontrivial solutions if and only if its solution set has at
least one parameter.
Recall that the number of parameters in the solution set is d = n − r, where r is the rank
of the coefficient matrix A and n is the number of unknowns.
Example 5.2. Does the linear homogeneous system have any nontrivial solutions?
3x1 + x2 − 9x3 = 0
x1 + x2 − 5x3 = 0
2x1 + x2 − 7x3 = 0
Solution. The linear system will have a nontrivial solution if the solution set has at least one
free parameter. Form the augmented matrix:
3 1 −9 0
1 1 −5 0
2 1 −7 0
41
Homogeneous and Nonhomogeneous Systems
x1 = 2t
x2 = 3t
x3 = t
span{v}.
Example 5.3. Find the general solution of the homogenous system Ax = 0 where
1 2 2 1 4
A = 3 7 7 3 13 .
2 5 5 2 9
42
Lecture 5
where t1 , t2 , t3 are arbitrary parameters. In other words, any solution x is in the span of
v1 , v2 , v3 :
x ∈ span{v1 , v2 , v3 }.
The form of the general solution in Example 5.3 holds in general and is summarized in
the following theorem.
Theorem 5.4: Consider the homogenous linear system Ax = 0, where A ∈ Mm×n and
0 ∈ Rm . Let r be the rank of A.
x = t1 v1 + t2 v2 + · · · + tp vd .
43
Homogeneous and Nonhomogeneous Systems
x ∈ span{v1 , v2 , . . . , vd }.
x = t1 v1 + t2 v2 + · · · + tp vd
Aq = A(p + v)
= Ap + Av
=b+0
= b.
Theorem 5.5: Suppose that the linear system Ax = b is consistent and let p be a
solution. Then any other solution q of the system Ax = b can be written in the form
q = p + v, for some vector v that is a solution to the homogeneous system Ax = 0.
Another way of stating Theorem 5.5 is the following: If the linear system Ax = b is consistent
and has solutions p and q, then the vector v = q−p is a solution to the homogeneous system
Ax = 0. The proof is a simple computation:
Av = A(q − p) = Aq − Ap = b − b = 0.
q = p + t1 v1 + t2 v2 + · · · + tp vd
where p is one particular solution of Ax = b and the vectors v1 , v2 , . . . , vd span the solution
set of the homogeneous system Ax = 0.
44
Lecture 5
There is a useful geometric interpretation of the solution set of a general linear system.
We saw in Lecture 3 that we can interpret the span of a set of vectors as a plane containing
the zero vector 0. Now, the general solution of Ax = b can be written as
x = p + t1 v1 + t2 v2 + · · · + tp vd .
Therefore, the solution set of Ax = b is a shift of the span{v1 , v2 , . . . , vd } by the vector p.
This is illustrated in Figure 5.1.
p + span{v}
p + tv
b
p span{v}
b
b
tv
b
v
b
Example 5.6. Write the general solution, in parametric vector form, of the linear system
3x1 + x2 − 9x3 = 2
x1 + x2 − 5x3 = 0
2x1 + x2 − 7x3 = 1.
Solution. The RREF of the augmented matrix is:
3 1 −9 2 1 0 −2 1
1 1 −5 0 ∼ 0 1 −3 −1
2 1 −7 1 0 0 0 0
The system is consistent and the rank of the coefficient matrix is r = 2. Therefore, there
are d = 3 − 2 = 1 parameters in the solution set. Letting x3 = t be the parameter, from the
second row of the RREF we have
x2 = 3t − 1
And from the first row of the RREF we have
x1 = 2t + 1
Therefore, the general solution of the system in parametric vector form is
2t + 1 1 2
x = 3t − 1 = −1 +t 3
t 0 1
| {z } |{z}
p v
45
Homogeneous and Nonhomogeneous Systems
You should check that p = (1, −1, 0) solves the linear system Ax = b, and that v = (2, 3, 1)
solves the homogeneous system Ax = 0.
Example 5.7. Write the general solution, in parametric vector form, of the linear system
represented by the augmented matrix
3 −3 6 3
−1 1 −2 −1 .
2 −2 4 2
Solution. Write the general solution, in parametric vector form, of the linear system repre-
sented by the augmented matrix
3 −3 6 3
−1 1 −2 −1
2 −2 4 2
Here n = 3, r = 1 and therefore the solution set will have d = 2 parameters. Let x3 = t1
and x2 = t2 . Then from the first row we obtain
x1 = 1 + x2 − 2x3 = 1 + t2 − 2t1
Ap = b
Av1 = Av2 = 0
46
Lecture 5
5.3 Summary
The material in this lecture is so important that we will summarize the main results. The
solution set of a linear system Ax = b can be written in the form
x = p + t1 v1 + t2 v2 + · · · + td vd
or
47
Homogeneous and Nonhomogeneous Systems
48
Lecture 6
Lecture 6
Linear Independence
x = t1 v1 + t2 v2 + · · · + tn vn .
A natural question that arises is whether or not there are multiple ways to express x as a
linear combination of the vectors v1 , v2 , . . . , vn . For example, if v1 = (1, 2), v2 = (0, 1),
v3 = (−1, −1), and x = (3, −1) then you can verify that x ∈ span{v1 , v2 , v3 } and x can be
written in infinitely many ways using v1 , v2 , v3 . Here are three ways:
The fact that x can be written in more than one way in terms of v1 , v2 , v3 suggests that there
might be a redundancy in the set {v1 , v2 , v3 }. In fact, it is not hard to see that v3 = −v1 +v2 ,
and thus v3 ∈ span{v1 , v2 }. The preceding discussion motivates the following definition.
49
Linear Independence
2v1 − v2 + v3 = 0.
Hence, because {v1 , v2 v3 } is a linearly dependent set, it is possible to write the zero vector
0 as a linear combination of {v1 , v2 v3 } where not all the coefficients in the linear
combination are zero. This leads to the following characterization of linear independence.
Theorem 6.3: The set of vectors {v1 , v2 , . . . , vn } is linearly independent if and only if 0
can be written in only one way as a linear combination of {v1 , v2 , . . . , vn }. In other words,
if
t1 v1 + t2 v2 + · · · + tn vn = 0
then necessarily the coefficients t1 , t2 , . . . , tn are all zero.
t1 v1 + t2 v2 + · · · + tn vn = 0
r1 v1 + r2 v2 + · · · + rn vn = x
s1 v1 + s2 v2 + · · · + sn vn = x.
50
Lecture 6
Theorem 6.4: The set {v1 , v2 , . . . , vn } is linearly independent if and only if the the rank
of A is r = n, that is, if the number of leading entries r in the REF (or RREF) of A is
exactly n.
51
Linear Independence
52
Lecture 6
spans the solution set of the system Ax = 0. Choosing for instance t = 2 we obtain the
solution
2 4
x = t −1 = −2 .
1 2
Therefore,
4v1 − 2v2 + 2v3 = 0
is a non-trivial linear combination of v1 , v2 , v3 that gives the zero vector 0. And, for instance,
v3 = −2v1 + v2
Below we record some simple observations on the linear independence of simple sets:
• A set consisting of two non-zero vectors {v1 , v2 } is linearly independent if and only if
neither of the vectors is a multiple of the other. For example, if v2 = tv1 then
tv1 − v2 = 0
• Any set {v1 , v2 , . . . , vp } containing the zero vector, say that vp = 0, is linearly depen-
dent. For example, the linear combination
53
Linear Independence
Proof. Let A = v1 v2 · · · vp . Thus, A is a n × p matrix. Since A has n rows, the
maximum rank of A is n, that is r ≤ n. Therefore, the number of free parameters d = p − r
is always positive because p > n ≥ r. Thus, the homogeneous system Ax = 0 has non-trivial
solutions. In other words, there is some non-zero vector x ∈ Rp such that
Ax = x1 v1 + x2 v2 + · · · + xp vp = 0
−2 6 1 3 7
0 0 0 1 −2
One solution to the linear system Ax = 0 is x = (−1, 1, 0, −2, −1) and therefore
Example 6.9. Suppose that the set {v1 , v2 , v3 , v4 } is linearly independent. Show that the
set {v1 , v2 , v3 } is also linearly independent.
Solution. We must argue that if there exists scalars x1 , x2 , x3 such that
x1 v1 + x2 v2 + x3 v3 = 0
then necessarily x1 , x2 , x3 are all zero. Suppose then that there exists scalars x1 , x2 , x3 such
that
x1 v1 + x2 v2 + x3 v3 = 0.
Then clearly it holds that
x1 v1 + x2 v2 + x3 v3 + 0v4 = 0.
But the set {v1 , v2 , v3 , v4 } is linearly independent, and therefore, it is necessary that x1 , x2 , x3
are all zero. This proves that v1 , v2 , v3 are also linearly independent.
54
Lecture 6
55
Linear Independence
56
Lecture 7
Lecture 7
T : Rn → Rm .
In other words, b is in the range of T if there is an input x in the domain of T that outputs
b = T(x). In general, not every point in the co-domain of T is in the range of T. For
example, consider the vector mapping T : R2 → R2 defined as
" 2 #
x1 sin(x2 ) − cos(x21 − 1)
T(x) = .
x21 + x22 + 1
The vector b = (3, −1) is not in the range of T because the second component of T(x) is
positive. On the other hand, b = (−1, 2) is in the range of T because
2
1 1 sin(0) − cos(12 − 1) −1
T = = = b.
0 12 + 02 + 1 2
Hence, a corresponding input for this particular b is x = (1, 0). In Figure 7.1 we illustrate
the general setup of how the domain, co-domain, and range of a mapping are related. A
crucial idea is that the range of T may not equal the co-domain.
57
Introduction to Linear Mappings
Rm , Co-domain
x b T(x)
b
R
an
ge
Rn , domain
For our purposes, vector mappings T : Rn → Rm can be organized into two categories: (1)
linear mappings and (2) nonlinear mappings.
" #
x21 sin(x2 ) − cos(x21 − 1)
T(x) =
x21 + x22 + 1
1 −1
T = .
0 2
58
Lecture 7
If T were linear then by property (2) of Definition 7.2 the following must hold:
3 1
T =T 3
0 0
1
= 3T
0
−1
=3
2
−3
= .
6
However,
2
3 3 sin(0) − cos(32 − 1)
T =
0 32 + 02 + 1
− cos(8)
=
10
−3
6= .
6
59
Introduction to Linear Mappings
= cT(u)
Therefore, both conditions of Definition 7.2 hold, and thus T is a linear map.
Example 7.4. Let α ≥ 0 and define the mapping T : Rn → Rn by the formula T(x) = αx.
If 0 ≤ α ≤ 1 then T is called a contraction and if α > 1 then T is called a dilation. In
either case, show that T is a linear mapping.
This shows that condition (1) in Definition 7.2 holds. To show that the second condition
holds, let c is any number. Then
Therefore, both conditions of Definition 7.2 hold, and thus T is a linear mapping. To see a
particular example, consider the case α = 12 and n = 3. Then,
1
x
2 1
T(x) = 21 x = 12 x2 .
1
x
2 3
60
Lecture 7
T(x) = Ax.
Such a mapping T will be called a matrix mapping corresponding to A and when con-
venient we will use the notation TA to indicate that TA is associated to A. We proved in
Lecture 4 (Theorem 4.3), that for any u, v ∈ Rn , and scalar c, matrix-vector multiplication
satisfies the properties:
1. A(u + v) = Au + Av
2. A(cu) = cAu.
Solution. In Example 7.3 we showed that T is a linear mapping using Definition 7.2. Alter-
natively, we observe that T is a mapping defined using matrix-vector multiplication because
2x1 − x2 2 −1
x1 x
T = x1 + x2 = 1 1 1
x2 x2
−x1 − 3x2 −1 −3
61
Introduction to Linear Mappings
Let TA : Rn → Rm be a matrix mapping, that is, TA (x) = Ax. We proved that the
output vector Ax is a linear combination of the columns of A where the coefficients in the
linear combination are the components of x. Explicitly, if A = [v1 v2 · · · vn ] and the
components of x = (x1 , x2 , . . . , xn ) then
Ax = x1 v1 + x2 v2 + · · · + xn vn .
Therefore, the range of the matrix mapping TA(x) = Ax is
Range(TA ) = span{v1 , v2 , . . . , vn }.
In words, the range of a matrix mapping is the span of its columns. Therefore, if v1 , v2 , . . . , vn
span all of Rm then every vector b ∈ Rm is in the range of TA .
7.4 Examples
If T : Rn → Rm is a linear mapping, then for any vectors v1 , v2 , . . . , vp and scalars
c1 , c2 , . . . , cp , it holds that
T(c1 v1 + c2 v2 + · · · + cp vd ) = c1 T(v1 ) + c2 T(v2 ) + · · · + cd T(vp ). (⋆)
62
Lecture 7
Therefore, if all you know are the values T(v1 ), T(v2 ), . . . , T(vp ) and T is linear, then you
can compute T(v) for every
v ∈ span{v1 , v2 , . . . , vp }.
Example 7.10. (Rotations) Let Tθ : R2 → R2 be the mapping on the 2D plane that rotates
every v ∈ R2 by an angle θ. Write down a formula for Tθ and show that Tθ is a linear
mapping.
Tθ (v)
b
θ b
v
α
b
63
Introduction to Linear Mappings
If we scale v by any c > 0 then performing the same computation as above we obtain that
Tθ (cv) = cT(v). Therefore, Tθ is a matrix mapping with corresponding matrix
" #
cos(θ) − sin(θ)
A= .
sin(θ) cos(θ)
Therefore, T is a linear mapping. Geometrically, T takes the vector x and projects it to the
(x1 , x2 ) plane, see Figure 7.2. What is the range of T? The range of T consists of all vectors
in R3 of the form
t
b = s
0
where the numbers t and s are arbitrary. For each b in the range of T, there are infinitely
many x’s such that T(x) = b.
x1
b x = x2
x3
b
x1
b T(x) = x2
0
Figure 7.2: Projection onto the (x1 , x2 ) plane
64
Lecture 7
65
Introduction to Linear Mappings
66
Lecture 8
Lecture 8
For a matrix mapping TA (x) = Ax, the range of TA is the span of the columns of A.
Therefore:
Theorem 8.3: Let TA : Rn → Rm be the matrix mapping TA (x) = Ax, where A ∈ Rm×n .
Then TA is onto if and only if r = rank(A) = m.
67
Onto, One-to-One, and Standard Matrix
Is TA onto?
Solution. The rref(A) is
1 2 −1 4 1 0 −1 0
A = −1 4 1 8 ∼ 0 1 0 2
2 0 −2 0 0 0 0 0
6 R3 .
span{v1 , v2 , v3 , v4 } = span{v1 , v2 } =
Below is a theorem which places restrictions on the size of the domain of an onto mapping.
Proof. If TA is onto then the rref(A) has r = m leading 1’s. Therefore, A has at least m
columns. The number of columns of A is n. Therefore, m ≤ n.
An equivalent way of stating Theorem 8.6 is the following.
68
Lecture 8
When T is a linear mapping, we have all the tools necessary to give a complete description
of when T is one-to-one. To do this, we use the fact that if T : Rn → Rm is linear then
T(0) = 0. Here is one proof: T(0) = T(x − x) = T(x) − T(x) = 0.
69
Onto, One-to-One, and Standard Matrix
1. TA is one-to-one.
Is TA one-to-one?
Solution. By Theorem 8.11, TA is one-to-one if and only if the columns of A are linearly
independent. The columns of A lie in R3 and there are n = 4 columns. From Lecture 6, we
know then that the columns are not linearly independent. Therefore, TA is not one-to-one.
Alternatively, A will have rank at most r = 3 (why?). Therefore, the solution set to Ax = 0
will have at least one parameter, and thus there exists infinitely many solutions to Ax = 0.
Intuitively, because R4 is “larger” than R3 , the linear mapping TA will have to project R4
onto R3 and thus infinitely many vectors in R4 will be mapped to the same vector in R3 .
Is TA one-to-one?
Solution. By inspection, we see that the columns of A are linearly independent. Therefore,
TA is one-to-one. Alternatively, one can compute that
1 0
rref(A) = 0 1
0 0
70
Lecture 8
x = x1 e1 + x2 e2 + · · · + xn en .
T(x) = T(x1 e1 + x2 e2 + · · · + xn en )
= x1 T(e1 ) + x2 T(e2 ) + · · · + xn T(en )
= x1 v1 + x2 v2 + · · · + xn vn
= v1 v2 · · · vn x.
Define the matrix A ∈ Mm×n by A = v1 v2 · · · vn . Then our computation above
shows that
T(x) = x1 v1 + x2 v2 + · · · + xn vn = Ax.
Therefore, T is a matrix mapping with the matrix A ∈ Mm×n .
71
Onto, One-to-One, and Standard Matrix
is called the standard matrix of T. In words, the columns of A are the images of the
standard unit vectors e1 , e2 , . . . , en under T. The punchline is that if T is a linear mapping,
then to derive properties of T we need only know the standard matrix A corresponding to
T.
Tθ (e2 ) b e2
b b
Tθ (e1 )
θ b
e1
Therefore,
2 0 0
A = T(e1 ) T(e2 ) T(e3 ) = 0 2 0
0 0 2
is the standard matrix of T.
72
Lecture 8
• the relationship between the range of a matrix mapping T(x) = Ax and the span of
the columns of A
• what it means for a mapping to be onto and one-to-one
• how to verify if a linear mapping is onto and one-to-one
• that all linear mappings are matrix mappings
• what the standard unit vectors are
• how to compute the standard matrix of a linear mapping
73
Onto, One-to-One, and Standard Matrix
74
Lecture 9
Lecture 9
Matrix Algebra
75
Matrix Algebra
Theorem 9.4: Let A, B, C be matrices of the same size and let α, β be scalars. Then
76
Lecture 9
Rm
Rp TB Rn TA
b b
x b TB (x) TA (TB(x))
(TA ◦ TB )(x)
Now Be1 is
Be1 = b1 b2 · · · bp e1 = b1 .
Definition 9.5: For A ∈ Rm×n and B ∈ Rn×p , with B = b1 b2 · · · bp , we define the
product AB by the formula
AB = Ab1 Ab2 · · · Abp .
The product AB is defined only when the number of columns of A equals the number of
rows of B. The following diagram is useful for remembering this:
(m × n) · (n × p) → m × p
From our definition of AB, the standard matrix of the composite mapping TA ◦ TB is
C = AB.
77
Matrix Algebra
−4 2 4 −4
1 2 −2
AB = −1 −5 −3 3
1 1 −3
−4 −4 −3 −1
2
=
7
2 0
=
7 9
2 0 4
=
7 9 10
2 0 4 4
=
7 9 10 2
On the other hand, BA is not defined! B has 4 columns and A has 2 rows.
−4 4 3 −1 −1 0
A = 3 −3 −1 , B = −3 0 −2
−2 −1 1 −2 1 −2
−4 4 3 −1 −1 0
AB = 3 −3 −1 −3 0 −2
−2 −1 1 −2 1 −2
−14
= 8
3
−14 7
= 8 −4
3 3
−14 7 −14
= 8 −4 8
3 3 0
78
Lecture 9
An important matrix that arises frequently is the identity matrix In ∈ Rn×n of size
n:
1 0 0 ··· 0
0 1 0 · · · 0
In = .. .. .. ..
. . . · · · .
0 0 0 ··· 1
You should verify that for any A ∈ Rn×n it holds that AIn = In A = A. Below are some
basic algebraic properties of matrix multiplication.
Ak = |AAA
{z· · · A}
k times
79
Matrix Algebra
Definition 9.10: Given a matrix A ∈ Rm×n , the transpose of A is the matrix AT whose
ith column is the ith row of A.
80
Lecture 9
Next compute BT AT :
−2 −1 0 −2 3
BT AT = 1 −2 0 1 −1
2 0 −1 0 −3
3 −5
= −4
5 = (AB)T
−4 9
Theorem 9.12: Let A and B be matrices of appropriate sizes. The following hold:
(1) (AT )T = A
(2) (A + B)T = AT + BT
(3) (αA)T = αAT
(4) (AB)T = BT AT
Example 9.13. Let T : R2 → R2 be the linear mapping that first contracts vectors by a
factor of k = 3 and then rotates by an angle θ. What is the standard matrix A of T?
Solution. Let e1 = (1, 0) and e2 = (0, 1) denote the standard
unit vectors in R2 . From
Lecture 8, the standard matrix of T is A = T(e1 ) T(e2 ) . Recall that the standard matrix
of a rotation by θ is
cos(θ) − sin(θ)
sin(θ) cos(θ)
Contracting e1 by a factor of k = 3 results in ( 13 , 0) and then rotation by θ results in
1
cos(θ)
3
1 = T(e1 ).
3
sin(θ)
81
Matrix Algebra
Therefore, "1 #
3
cos(θ) − 13 sin(θ)
A = T(e1 ) T(e2 ) =
1 1
3
sin(θ) 3
cos(θ)
1
On the other hand, the standard matrix corresponding to a contraction by a factor k = 3
is
"1 #
3
0
1
0 3
82
Lecture 10
Lecture 10
Invertible Matrices
If A is invertible then can it have more than one inverse? Suppose that there exists C1 , C2
such that ACi = Ci A = In . Then
Thus, if A is invertible, it can have only one inverse. This motivates the following definition.
83
Invertible Matrices
Compute CA:
−14 −3 −6 1 −3 0 1 0 0
CA = −5 −1 −2 −1 2 −2 = 0 1 0
2 0 1 −2 6 1 0 0 1
Theorem 10.4: Let A ∈ Rn×n and suppose that A is invertible. Then for any b ∈ Rn
the matrix equation Ax = b has a unique solution given by A−1 b.
Proof: Let b ∈ Rn be arbitrary. Then multiplying the equation Ax = b by A−1 from the
left we obtain that
A−1 Ax = A−1 b
⇒ In x = A−1 b
⇒ x = A−1 b.
Ax = A(A−1 b) = AA−1 b = In b = b
and thus x = A−1 b is a solution. If x̃ is another solution of the equation, that is, Ax̃ = b,
then multiplying both sides by A−1 we obtain that x̃ = A−1 b. Thus, x = x̃.
Example 10.5. Use the result of Example 10.3. to solve the linear system Ax = b if
1 −3 0 1
A = −1
2 −2 , b = −3 .
−2 6 1 −1
84
Lecture 10
Verify:
1 −3 0 1 1
−1 2 −2 0 = −3
−2 6 1 1 −1
The following theorem summarizes the relationship between the matrix inverse and ma-
trix multiplication and matrix transpose.
(A−1 )−1 = A.
85
Invertible Matrices
In summary, to determine if A−1 exists and to simultaneously compute it, we compute the
RREF of the augmented matrix
A In ,
that is, A augmented with the n × n identity matrix. If the RREF of A is In , that is
A I n ∼ I n c1 c2 · · · cn
then
A−1 = c1 c2 · · · cn .
If the RREF of A is not In then A is not invertible.
1 3
Example 10.7. Find the inverse of A = if it exists.
−1 −2
Solution. Form the augmented matrix A I2 and row reduce:
1 3 1 0
A I2 =
−1 −2 0 1
Verify:
−1 1 3 −2 −3 1 0
AA = = .
−1 −2 1 1 0 1
1 0 3
Example 10.8. Find the inverse of A = 1 1 0 if it exists.
−2 0 −7
86
Lecture 10
Solution. Form the augmented matrix A I3 and row reduce:
1 0 3 1 0 0 1 0 3 1 0 0
−R1 +R2 , 2R1 +R2
1 1 0 0 1 0 −−−−−−−−−−→ 0 1 −3 −1 1 0
−2 0 −7 0 0 1 0 0 −1 2 0 1
−R3 :
1 0 3 1 0 0 1 0 3 1 0 0
−R3
0 1 −3 −1 1 0 − −→ 0 1 −3 −1 1 0
0 0 −1 2 0 1 0 0 1 −2 0 −1
3R3 + R2 and −3R3 + R1 :
1 0 3 1 0 0 1 0 0 7 0 3
3R3 +R2 , −3R3 +R1
0 1 −3 −1 1 0 − −−−−−−−−−−→ 0 1 0 −7 1 −3
0 0 1 −2 0 −1 0 0 1 −2 0 −1
Therefore, rref(A) = I3 , and therefore A is invertible. The inverse is
7 0 3
A−1 = −7 1 −3
−2 0 −1
Verify:
1 0 3 7 0 3 1 0 0
AA−1 = 1 1 0 −7 1 −3 = 0 1 0
−2 0 −7 −2 0 −1 0 0 1
1 0 1
Example 10.9. Find the inverse of A = 1 1 −2 if it exists.
−2 0 −2
Solution. Form the augmented matrix A I3 and row reduce:
1 0 1 1 0 0 1 0 1 1 0 0
−R1 +R2 , 2R1 +R2
1 1 −2 0 1 0 − −−−−−−−−−→ 0 1 −3 −1 1 0
−2 0 −2 0 0 1 0 0 0 2 0 1
We need not go further since the rref(A) is not I3 (rank(A) = 2 ). Therefore, A is not
invertible.
A−1 A = In .
87
Invertible Matrices
Similarly, the standard matrix of (TA ◦ TA−1 ) is also In . Intuitively, the linear mapping TA−1
undoes what TA does, and conversely. Moreover, since Ax = b always has a solution, TA is
onto. And, because the solution to Ax = b is unique, TA is one-to-one.
Proof: This is a summary of all the statements we have proved about matrices and matrix
mappings specialized to the case of square matrices A ∈ Rn×n . Note that for non-square
matrices, one-to-one does not imply ontoness, and conversely.
Example 10.11. Without doing any arithmetic, write down the inverse of the dilation
matrix " #
3 0
A= .
0 5
Example 10.12. Without doing any arithmetic, write down the inverse of the rotation
matrix " #
cos(θ) − sin(θ)
A= .
sin(θ) cos(θ)
88
Lecture 11
Lecture 11
Determinants
a11 x1 + a12 x2 = b1
a21 x1 + a22 x2 = b2 .
89
Determinants
Let
a22 a23 a21 a23 a21 a22
A11 = , A12 = , and A13 = .
a32 a33 a31 a33 a31 a32
Then we can write
D = a11 det(A11 ) − a12 det(A12 ) + a13 det(A13 ).
a22 a23
The matrix A11 = is obtained from A by deleting the 1st row and the 1st column:
a32 a33
a11 a12 a13
a 22 a23
A = a21 a22 a23 −→ A11 =
.
a32 a33
a31 a32 a33
90
Lecture 11
a21 a23
Similarly, the matrix A12 = is obtained from A by deleting the 1st row and the
a31 a33
2nd column:
a11 a12 a13
a21 a 23
A = a21 a22 a23 −→ A12 = .
a31 a33
a31 a32 a33
a21 a22
Finally, the matrix A13 = is obtained from A by deleting the 1st row and the 3rd
a31 a32
column:
a11 a12 a13
a21 a22
A = a21 a22 a23 −→ .
a31 a32
a31 a32 a33
Notice also that the sign in front of the coefficients a11 , a12 , and a13 , alternate. This motivates
the following definition.
Definition 11.3: Let A be a 3 × 3 matrix. Let Ajk be the 2 × 2 matrix obtained from
A by deleting the jth row and kth column. Define the cofactor of ajk to be the number
Cjk = (−1)j+k det Ajk . Define the determinant of A to be
This definition of the determinant is called the expansion of the determinant along the
first row. In the cofactor Cjk = (−1)j+k det Ajk , the expression (−1)j+k will evaluate to
either 1 or −1, depending on whether j + k is even or odd. For example, the cofactor of a12
is
C12 = (−1)1+2 det A12 = − det A12
and the cofactor of a13 is
C13 = (−1)1+3 det A13 = det A13 .
We can also compute the cofactor of the other entries of A in the obvious way. For example,
the cofactor of a23 is
C23 = (−1)2+3 det A23 = − det A23 .
A helpful way to remember the sign (−1)j+k of a cofactor is to use the matrix
+ − +
− + − .
+ − +
This works not just for 3 × 3 matrices but for any square n × n matrix.
91
Determinants
= 72 + 14 − 9
= 77
We can compute the determinant of a matrix A by expanding along any row or column.
For example, the expansion of the determinant for the matrix
a11 a12 a13
A = a21 a22 a23
a31 a32 a33
The punchline is that any way you choose to expand (row or column) you will get the same
answer. If a particular row or column contains zeros, say entry ajk , then the computation of
the determinant is simplified if you expand along either row j or column k because ajk Cjk = 0
and we need not compute Cjk .
Solution. In Example 11.4, we computed det(A) = 77 by expanding along the 1st row.
92
Lecture 11
det A = (1) det A31 − (0) det A32 + (6) det A33
−2 3 4 −2
= +6
3 5 2 3
= −19 + 96
= 77
The next theorem tells us that we can compute the determinant by expanding along any
row or column.
Corollary 11.8: If A has a row or column containing all zeros then det A = 0.
Proof. If the jth row contains all zeros then aj1 = aj2 = · · · = ajn = 0:
93
Determinants
Corollary 11.9: For any square matrix A it holds that det A = det AT .
Sketch of the proof. Expanding along the jth row of A is equivalent to expanding along
the jth column of AT .
1 3 0 −2
1 2 −2 −1
A=
0
0 2 1
−1 −3 1 0
Solution. The third row contains two zeros, so expand along this row:
1 3 0 −2
1 2 −2 −1
A=
0
0 2 1
−1 −3 1 0
94
Lecture 11
Theorem 11.13: The determinant of a triangular matrix is the product of its diagonal
entries.
95
Determinants
96
Lecture 12
Lecture 12
Theorem 12.1: Suppose that A ∈ Rn×n and let B be the matrix obtained by interchang-
ing two rows of A. Then det B = − det A.
a11 a12 a21 a22
Proof. Consider the 2 × 2 case. Let A = and let B = . Then
a21 a22 a11 a12
det B = a12 a21 − a11 a22 = −(a11 a22 − a12 a21 ) = − det A.
The general case is proved by induction.
97
Properties of the Determinant
Corollary 12.2: If A ∈ Rn×n has two rows (or two columns) that are equal then
det(A) = 0.
Proof. Suppose that A has rows j and k that are equal. Let B be the matrix obtained by
interchanging rows j and k. Then by the previous theorem det B = − det A. But clearly
B = A, and therefore det B = det A. Therefore, det(A) = − det(A) and thus det A = 0.
Now we consider how the determinant behaves under elementary row operations of Type
2.
Theorem 12.3: Let A ∈ Rn×n and let B be the matrix obtained by multiplying a row of
A by β. Then det B = β det A.
Proof. Suppose that B is obtained from A by multiplying the jth row by β. The rows of A
and B different from j are equal, and therefore
In particular, the (j, k) cofactors of A and B are equal. The jth row of B is βaj . Then,
expanding det B along the jth row:
= β(aj · cTj )
= β det A.
Theorem 12.4: Let A ∈ Rn×n and let B be the matrix obtained from A by adding β
times the kth row to the jth row. Then det B = det A.
Proof. For any matrix A and any row vector r = [r1 r2 · · · rn ] the expression
is the determinant of the matrix obtained from A by replacing the jth row with the row r.
Therefore, if k 6= j then
ak · cTj = 0
98
Lecture 12
since then rows k and j are equal. The jth row of B is bj = aj + βak . Therefore, expanding
det B along the jth row:
= det A.
Example 12.5. Suppose that A is a 4 × 4 matrix and suppose that det A = 11. If B is
obtained from A by interchanging rows 2 and 4, what is det B?
Solution. Interchanging (or swapping) rows changes the sign of the determinant. Therefore,
det B = −11.
Example 12.6. Suppose that A is a 4 × 4 matrix and suppose that det A = 11. Let
a1 , a2 , a3 , a4 denote the rows of A. If B is obtained from A by replacing row a3 by 3a1 + a3 ,
what is det B?
Solution. This is a Type 3 elementary row operation, which preserves the value of the de-
terminant. Therefore,
det B = 11.
Example 12.7. Suppose that A is a 4 × 4 matrix and suppose that det A = 11. Let
a1 , a2 , a3 , a4 denote the rows of A. If B is obtained from A by replacing row a3 by 3a1 + 7a3 ,
what is det B?
Solution. This is not quite a Type 3 elementary row operation because a3 is multiplied by
7. The third row of B is b3 = 3a1 + 7a3 . Therefore, expanding det B along the third row
= 7(a3 · cT3 )
= 7 det A
= 77
99
Properties of the Determinant
Example 12.8. Suppose that A is a 4 × 4 matrix and suppose that det A = 11. Let
a1 , a2 , a3 , a4 denote the rows of A. If B is obtained from A by replacing row a3 by 4a1 + 5a2 ,
what is det B?
Solution. Again, this is not a Type 3 elementary row operation. The third row of B is
b3 = 4a1 + 5a2 . Therefore, expanding det B along the third row
=0+0
=0
Proof. Beginning with the matrix A, perform elementary row operations and generate a
sequence of matrices A1 , A2 , . . . , Ap such that Ap is in row echelon form and thus triangular:
A ∼ A1 ∼ A2 ∼ · · · ∼ Ap .
Thus, matrix Ai is obtained from Ai−1 by performing one of the elementary row operations.
From Theorems 12.1, 12.3, 12.4, if det Ai−1 6= 0 then det Ai 6= 0. In particular, det A = 0 if
and only if det Ap = 0. Now, Ap is triangular and therefore its determinant is the product
of its diagonal entries. If all the diagonal entries are non-zero then det A = det Ap 6= 0. In
this case, A is invertible because there are r = n leading entries in Ap . If a diagonal entry
of Ap is zero then det A = det Ap = 0. In this case, A is not invertible because there are
r < n leading entries in Ap . Therefore, A is invertible if and only if det A 6= 0.
Theorem 12.10: Let A ∈ Rn×n and let B = βA, that is, B is obtained by multiplying
every entry of A by β. Then det B = β n det A.
100
Lecture 12
= β 2 det A.
= β 3 det A.
det(3A) = 34 det A
= 81 · 11
= 891
The following theorem characterizes how the determinant behaves under matrix multi-
plication.
101
Properties of the Determinant
Therefore
det(A) det(A−1 ) = 1
or equivalently
1
det A−1 = .
det A
102
Lecture 13
Lecture 13
where Cjk = (−1)j+k det Ajk is called the (j, k)-Cofactor of A and
aj = aj1 aj2 · · · ajn
is the jth row of A. If cj = Cj1 Cj2 · · · Cjn then
Cj1
C
j2
det A = aj1 aj2 · · · ajn .. = aj · cTj .
.
Cjn
Suppose that B is the matrix obtained from A by replacing row aj with a distinct row ak .
To compute det B expand along its jth row bj = ak :
det B = ak · cTj = 0.
The Cofactor Method is an alternative method to find the inverse of an invertible matrix.
Recall that for any matrix A ∈ Rn×n , if we expand along the jth row then
det A = aj · cTj .
103
Applications of the Determinant
Then,
a1
a2
T
A(Cof(A)) = .. cT1 cT2 · · · cTn
.
an
a1 cT1 a1 cT2 · · · a1 cTn
a cT a cT · · · a cT
2 1 2 2 2 n
=
.. .. .. ..
. . . .
T T T
an c1 an c2 · · · an cn
det A 0 ··· 0
0 det A · · · 0
=
.. .. .. ..
. . . .
0 0 · · · det A
A(Cof(A))T = det(A)In .
1
A−1 = (Cof(A))T
det A
Although this is an explicit and elegant formula for A−1, it is computationally intensive,
even for 3 × 3 matrices. However, for the 2 × 2 case it provides a useful formula to compute
104
Lecture 13
a b d −c
the matrix inverse. Indeed, if A = we have Cof(A) = and therefore
c d −b a
−1 1 d −b
A = .
ad − bc −c a
When does an integer matrix have an integer inverse? We can answer this question
using the Cofactor Method. Let us first be clear about what we mean by an integer matrix.
Suppose that A ∈ Rn×n is an invertible integer matrix. Then det(A) is a non-zero integer
and (Cof(A))T is an integer matrix. If A−1 is also an integer matrix then det(A−1 ) is also
an integer. Now det(A) det(A−1) = 1 thus it must be the case that det(A) = ±1. Suppose
on the other hand that det(A) = ±1. Then by the Cofactor method
1
A−1 = (Cof(A))T = ±(Cof(A))T
det(A)
and therefore A−1 is also an integer matrix. We have proved the following.
Theorem 13.2: An invertible integer matrix A ∈ Rn×n has an integer inverse A−1 if and
only if det A = ±1.
We can use the previous theorem to generate integer matrices with an integer inverse
as follows. Begin with an upper triangular matrix M0 having integer entries and whose
diagonal entries are either 1 or −1. By construction, det(M0 ) = ±1. Perform any sequence
of elementary row operations of Type 1 and Type 3. This generates a sequence of matrices
M1 , . . . , Mp whose entries are integers. Moreover,
M0 ∼ M1 ∼ · · · ∼ Mp .
Therefore,
±1 = det(M) = det(M1 ) = · · · = det(Mp ).
105
Applications of the Determinant
1
x1 = (b1 C11 + b2 C21 + · · · + bn Cn1 ).
det A
The expression b1 C11 + b2 C21 + · · · + bn Cn1 is the expansion of the determinant along the
first column of the matrix obtained from A by replacing the first column with b:
b1 a12 · · · a1n
b2 a22 · · · a2n
det .. .. .. . = b1 C11 + b2 C21 + · · · + bn Cn1
. . . ..
bn an2 · · · ann
Similarly,
1
x2 = (b1 C12 + b2 C22 + · · · + bn Cn2 )
det A
and (b1 C12 + b2 C22 + · · · + bn Cn2 ) is the expansion of the determinant along the second
column of the matrix obtained from A by replacing the second column with b. In summary:
Although this is an explicit and elegant formula for x, it is computationally intensive, and
used mainly for theoretical purposes.
106
Lecture 13
13.3 Volumes
The volume of the parallelepiped determined by the vectors v1 , v2 , v3 is
Vol(v1 , v2 , v3 ) = abs(v1T (v2 × v3 )) = abs(det v1 v2 v3 )
where abs(x) denotes the absolute value of the number x. Let A be an invertible matrix and
let w1 = Av1 , w2 = Av2 , w3 = Av3 . How are Vol(v1 , v2 , v2 ) and Vol(w1 , w2 , w2 ) related?
Compute:
Vol(w1 , w2 , w3 ) = abs(det w1 w2 w3 )
= abs det Av1 Av2 Av3
= abs det(A v1 v2 v3 )
= abs det A · det v1 v2 v3
= abs(det A) · Vol(v1 , v2 , v3 ).
Therefore, the number abs(det A) is the factor by which volume is changed under the linear
transformation with matrix A. In summary:
107
Applications of the Determinant
108
Lecture 14
Vector Spaces
1. the set Rn ,
In all of these sets, there is an operation of “addition“ and “multiplication by scalars”. Let’s
formalize then exactly what we mean by a vector space.
Definition 14.1: A vector space is a set V of objects, called vectors, on which two
operations called addition and scalar multiplication have been defined satisfying the
following properties. If u, v, w are in V and if α, β ∈ R are scalars:
(6) The scalar multiple of v by α, denoted αv, is in V. (closure under scalar multiplica-
tion)
(7) α(u + v) = αu + αv
(8) (α + β)v = αv + βv
(10) 1v = v
It can be shown that 0 · v = 0 for any vector v in V. To better understand the definition of
a vector space, we first consider a few elementary examples.
Solution. The circle is not closed under scalar multiplication. For example, take u = (1, 0) ∈
V and multiply by say α = 2. Then αu = (2, 0) is not in V. Therefore, property (6) of the
definition of a vector space fails, and consequently the unit disc is not a vector space.
Example 14.3. Let V be the graph of the quadratic function f (x) = x2 :
n o
V = (x, y) ∈ R2 | y = x2 .
Is V a vector space?
Solution. The set V is not closed under scalar multiplication. For example, u = (1, 1) is a
point in V but 2u = (2, 2) is not. You may also notice that V is not closed under addition
either. For example, both u = (1, 1) and v = (2, 4) are in V but u + v = (3, 5) and (3, 5) is
not a point on the parabola V. Therefore, the graph of f (x) = x2 is not a vector space.
110
Lecture 14
V = {(x, y) ∈ R2 | y = 2x}.
Is V a vector space?
Solution. We will show that V is a vector space. First, we verify that V is closed under
addition. We first note that an arbitrary point in V can be written as u = (x, 2x). Let then
u = (a, 2a) and v = (b, 2b) be points in V. Then
Therefore V is closed under addition. Verify that V is closed under scalar multiplication:
All the other properties of a vector space can be verified to hold; for example, addition is
commutative and associative in V because addition in R2 is commutative/associative, etc.
Therefore, the graph of the function f (x) = 2x is a vector space.
The following example is important (it will appear frequently) and is our first example
of what we could say is an “abstract vector space”. To emphasize, a vector space is a set
that comes equipped with an operation of addition and scalar multiplication and these two
operations satisfy the list of properties above.
Example 14.5. Let V = Pn [t] be the set of all polynomials in the variable t and of degree
at most n: n o
Pn [t] = a0 + a1 t + a2 t2 + · · · + an tn | a0 , a1 , . . . , an ∈ R .
Is V a vector space?
111
Vector Spaces
Then u + v is a polynomial of degree at most n and thus (u + v) ∈ Pn [t], and therefore this
shows that Pn [t] is closed under addition. Now let α be a scalar, define a new polynomial
(αu) as follows:
(αu)(t) = (αu0 ) + (αu1 )t + · · · + (αun )tn
Then (αu) is a polynomial of degree at most n and thus (αu) ∈ Pn [t]; hence, Pn [t] is closed
under scalar multiplication. The 0 vector in Pn [t] is the zero polynomial 0(t) = 0. One can
verify that all other properties of the definition of a vector space also hold; for example,
addition is commutative and associative, etc. Thus Pn [t] is a vector space.
Example 14.6. Let V = Mm×n be the set of all m × n matrices. Under the usual operations
of addition of matrices and scalar multiplication, is Mn×m a vector space?
Solution. Given matrices A, B ∈ Mm×n and a scalar α, we defined the sum A + B by adding
entry-by-entry, and αA by multiplying each entry of A by α. It is clear that the space
Mm×n is closed under these two operations. The 0 vector in Mm×n is the matrix of size
m × n having all entries equal to zero. It can be verified that all other properties of the
definition of a vector space also hold. Thus, the set Mm×n is a vector space.
Example 14.7. The n-dimensional Euclidean space V = Rn under the usual operations of
addition and scalar multiplication is vector space.
Example 14.8. Let V = C[a, b] denote the set of functions with domain [a, b] and co-domain
R that are continuous. Is V a vector space?
(3) W is closed under scalar multiplication, that is, if u is in W and α is a scalar then
αu is in W.
W = {(x, y) ∈ R2 | y = 2x}.
Is W a subspace of V = R2 ?
112
Lecture 14
Example 14.12. Let V = Mn×n be the vector space of all n × n matrices. We define the
trace of a matrix A ∈ Mn×n as the sum of its diagonal entries:
tr(A) = a11 + a22 + · · · + ann .
Let W be the set of all n × n matrices whose trace is zero:
W = {A ∈ Mn×n | tr(A) = 0}.
Is W a subspace of V?
Solution. If 0 is the n × n zero matrix then clearly tr(0) = 0, and thus 0 ∈ Mn×n . Suppose
that A and B are in W. Then necessarily tr(A) = 0 and tr(B) = 0. Consider the matrix
C = A + B. Then
tr(C) = tr(A + B) = (a11 + b11 ) + (a22 + b22 ) + · · · + (ann + bnn )
= (a11 + · · · + ann ) + (b11 + · · · + bnn )
= tr(A) + tr(B)
=0
113
Vector Spaces
Thus, tr(C) = 0, that is, C = αA ∈ W, and consequently W is closed under scalar multipli-
cation. Therefore, the set W is a subspace of V.
W = {u ∈ Pn [t] | u′ (1) = 0}
Solution. The zero polynomial 0(t) = 0 clearly has derivative at t = 1 equal to zero, that is,
0′ (1) = 0, and thus the zero polynomial is in W. Now suppose that u(t) and v(t) are two
polynomials in W. Then, u′ (1) = 0 and also v′ (1) = 0. To verify whether or not W is closed
under addition, we must determine whether the sum polynomial (u + v)(t) has a derivative
at t = 1 equal to zero. From the rules of differentiation, we compute
Therefore, the polynomial (u + v) is in W, and thus W is closed under addition. Now let α
be any scalar and let u(t) be a polynomial in W. Then u′ (1) = 0. To determine whether or
not the scalar multiple αu(t) is in W we must determine if αu(t) has a derivative of zero at
t = 1. Using the rules of differentiation, we compute that
Therefore, the polynomial (αu)(t) is in W and thus W is closed under scalar multiplication.
All three properties of a subspace hold for W and therefore W is a subspace of Pn [t].
Solution. The zero polynomial 0(t) = 0 clearly does not equal −1 at t = 2. Therefore, W
does not contain the zero polynomial and, because all three conditions of a subspace must be
satisfied for W to be a subspace, then W is not a subspace of Pn [t]. As an exercise, you may
want to investigate whether or not W is closed under addition and scalar multiplication.
114
Lecture 14
Example 14.16. For any vector space V, there are two trivial subspaces in V, namely, V
itself is a subspace of V and the set consisting of the zero vector W = {0} is a subspace of
V.
There is one particular way to generate a subspace of any given vector space V using the
span of a set of vectors. Recall that we defined the span of a set of vectors in Rn but we can
define the same notion on a general vector space V.
115
Vector Spaces
116
Lecture 15
Lecture 15
Linear Maps
Before we begin this Lecture, we review subspaces. Recall that W is a subspace of a vector
space V if W is a subset of V and
1. the zero vector 0 in V is also in W,
2. for any vectors u, v in W the sum u + v is also in W, and
3. for any vector u in W and any scalar α the vector αu is also in W.
In the previous lecture we gave several examples of subspaces. For example, we showed that
a line through the origin in R2 is a subspace of R2 and we gave examples of subspaces of
Pn [t] and Mn×m . We also showed that if v1 , . . . , vp are vectors in a vector space V then
W = span{v1 , v2 , . . . , vp }
is a subspace of V.
Example 15.2. Let V = Mn×n be the vector space of n × n matrices and let T : V → V be
the mapping
T(A) = A + AT .
117
Linear Maps
Is T is a linear mapping?
Solution. Let A and B be matrices in V. Then using the properties of the transpose and
regrouping we obtain:
T(A + B) = (A + B) + (A + B)T
= A + B + AT + BT
= (A + AT ) + (B + BT )
= T(A) + T(B).
This proves that T satisfies both conditions of Definition 15.1 and thus T is a linear mapping.
Example 15.3. Let V = Mn×n be the vector space of n × n matrices, where n ≥ 2, and let
T : V → R be the mapping
T(A) = det(A)
Is T is a linear mapping?
Solution. If T is a linear mapping then according to Definition 15.1, we must have T(A +
B) = det(A + B) = det(A) + det(B) and also T(αA) = αT(A) for any scalar α. Do
these properties actually hold though? For example, we know from the properties of the
determinant that det(αA) = αn det(A) and therefore it does not hold that T(αA) = αT(A)
unless α = 1. Therefore, T is not a linear mapping. Also, it does not hold in general that
det(A + B) = det(A) + det(B); in fact it rarely holds. For example, if
2 0 −1 1
A= , B=
0 1 0 3
then det(A) = 2, det(B) = −3 and therefore det(A) + det(B) = −1. On the other hand,
1 1
A+B=
0 4
Example 15.4. Let V = Pn [t] be the vector space of polynomials in the variable t of degree
no more than n ≥ 1. Consider the mapping T : V → V define as
118
Lecture 15
Is T is a linear mapping?
Solution. Let f (t) and g(t) be polynomials of degree no more than n ≥ 1. Then
Therefore, T(f (t) + g(t)) = T(f (t)) + T(g(t)). Now let α be any scalar. Then
1. The kernel of T is the set of vectors v in the domain V that get mapped to the zero
vector, that is, T(v) = 0. We denote the kernel of T by ker(T):
2. The range of T is the set of vectors b in the codomain U for which there exists at
least one v in V such that T(v) = b. We denote the range of T by Range(T):
You may have noticed that the definition of the range of a linear mapping on an abstract
vector space is the usual definition of the range of a function. Not surprisingly, the kernel
and range are subspaces of the domain and codomain, respectively.
119
Linear Maps
Proof. Suppose that v and u are in ker(T). Then T(v) = 0 and T(u) = 0. Then by linearity
of T it holds that
T(v + u) = T(v) + T(u) = 0 + 0 = 0.
Therefore, since T(u + v) = 0 then u + v is in ker(T). This shows that ker(T) is closed
under addition. Now suppose that α is any scalar and v is in ker(T). Then T(v) = 0 and
thus by linearity of T it holds that
T(αv) = αT(v) = α0 = 0.
Therefore, since T(αv) = 0 then αv is in ker(T) and this proves that ker(T) is closed under
scalar multiplication. Lastly, by linearity of T it holds that
T(0) = T(v − v) = T(v) − T(v) = 0
that is, T(0) = 0. Therefore, the zero vector 0 is in ker(T). This proves that ker(T) is a
subspace of V. The proof that Range(T) is a subspace of U is left as an exercise.
Example 15.7. Let V = Mn×n be the vector space of n × n matrices and let T : V → V be
the mapping
T(A) = A + AT .
Describe the kernel of T.
Solution. A matrix A is in the kernel of T if T(A) = A + AT = 0, that is, if AT = −A.
Hence,
ker(A) = {A ∈ Mn×n | AT = −A}.
What type of matrix A satisfies AT = −A? For example, consider the case that A is the
2 × 2 matrix
a11 a12
A=
a21 a22
and AT = −A. Then
a11 a21 −a11 −a12
= .
a12 a22 −a21 −a22
Therefore, it must hold that a11 = −a11 , a21 = −a12 and a22 = −a22 . Then necessarily
a11 = 0 and a22 = 0 and a12 can be arbitrary. For example, the matrix
0 7
A=
−7 0
satisfies AT = −A. Using a similar computation as above, a 3 × 3 matrix satisfies AT = −A
if A is of the form
0 a b
A = −a 0 c
−b −c 0
120
Lecture 15
Example 15.8. Let V be the vector space of differentiable functions on the interval [a, b].
That is, f is an element of V if f : [a, b] → R is differentiable. Describe the kernel of the
linear mapping T : V → V defined as
Solution. A function f is in the kernel of T if T(f (x)) = 0, that is, if f (x) + f ′ (x) = 0.
Equivalently, if f ′ (x) = −f (x). What functions f do you know of satisfy f ′ (x) = −f (x)?
How about f (x) = e−x ? It is clear that f ′ (x) = −e−x = −f (x) and thus f (x) = e−x is in
ker(T). How about g(x) = 2e−x ? We compute that g ′(x) = −2e−x = −g(x) and thus g is
also in ker(T). It turns out that the elements of ker(T) are of the form f (x) = Ce−x for a
constant C.
Null(A) = {v ∈ Rn | Av = 0}.
ker(TA) = Null(A).
Hence, by Theorem 15.10, if u and v are two solutions to the linear system Ax = 0 then
αu + βv is also a solution:
121
Linear Maps
Is W a subspace of V?
Solution. The set W is the null space of the matrix 1 × 4 matrix A given by
A = 2 −3 1 −7 .
x = t1 v1 + t2 v2 + · · · + td vd
Null(A) = span{v1 , v2 , . . . , vd }.
Example 15.12. Find a spanning set for the null space of the matrix
−3 6 −1 1 −7
A = 1 −2
2 3 −1 .
2 −4 5 8 −4
Solution. The null space of A is the solution set of the homogeneous system Ax = 0.
Performing elementary row operations one obtains
1 −2 0 −1 3
A ∼ 0 0 1 2 −2 .
0 0 0 0 0
Clearly r = rank(A) and since n = 5 we will have d = 3 vectors in a spanning set for
Null(A). Letting x5 = t1 , and x4 = t2 , then from the 2nd row we obtain
x3 = −2t2 + 2t1 .
x1 = 2t3 + t2 − 3t1 .
122
Lecture 15
Therefore,
−3 1 2
0 0 1
Null(A) = span 2 , −2 0
0 1 0
1 0 0
| {z } | {z } |{z}
v1 v2 v3
Ax = x1 v1 + x2 v2 + · · · + xn vn
Definition 15.13: Let A ∈ Mm×n be a matrix. The span of the columns of A is called
the column
space of A. The column space of A is denoted by Col(A). Explicitly, if
A = v1 v2 · · · vn then
Col(A) = span{v1 , v2 , . . . , vn }.
Range(TA ) = Col(A).
123
Linear Maps
• what the column space of a matrix is and how to determine if a given vector is in the
column space
124
Lecture 16
are redundant since a total displacement in the NORTH-EAST direction can be obtained
by combining individual NORTH and EAST displacements. With these vague statements
out of the way, we introduce the formal definition of what it means for a set of vectors to be
“efficient”.
Definition 16.1: Let V be a vector space and let {v1 , v2 , . . . , vp } be a set of vectors in
V. Then {v1 , v2 , . . . , vp } is linearly independent if the only scalars c1 , c2 , . . . , cp that
satisfy the equation
c1 v1 + c2 v2 + · · · + cp vp = 0
are the trivial scalars c1 = c2 = · · · = cp = 0. If the set {v1 , . . . , vp } is not linearly
independent then we say that it is linearly dependent.
We now describe the redundancy in a set of linear dependent vectors. If {v1 , . . . , vp } are
linearly dependent, it follows that there are scalars c1 , c2 , . . . , cp , at least one of which is
nonzero, such that
c1 v1 + c2 v2 + · · · + cp vp = 0. (⋆)
For example, suppose that {v1 , v2 , v3 , v4 } are linearly dependent. Then there are scalars
c1 , c2 , c3 , c4 , not all of them zero, such that equation (⋆) holds. Suppose, for the sake of
argument, that c3 6= 0. Then,
c1 c2 c4
v3 = − v1 − v2 − v4 .
c3 c3 c3
Linear Independence, Bases, and Dimension
Therefore, when a set of vectors is linearly dependent, it is possible to write one of the vec-
tors as a linear combination of the others. It is in this sense that a set of linearly dependent
vectors are redundant. In fact, if a set of vectors are linearly dependent we can say even
more as the following theorem states.
Example 16.3. Show that the following set of 2 × 2 matrices is linearly dependent:
1 2 −1 3 5 0
A1 = , A2 = , A3 = .
0 −1 1 0 −2 −3
Solution. It is clear that A1 and A2 are linearly independent, i.e., A1 cannot be written as
a scalar multiple of A2 , and vice-versa. Since the (2, 1) entry of A1 is zero, the only way to
get the −2 in the (2, 1) entry of A3 is to multiply A2 by −2. Similary, since the (2, 2) entry
of A2 is zero, the only way to get the −3 in the (2, 2) entry of A3 is to multiply A1 by 3.
Hence, we suspect that 3A1 − 2A2 = A3 . Verify:
3 6 −2 6 5 0
3A1 − 2A2 = − = = A3
0 −3 2 0 −2 −3
Therefore, 3A1 − 2A2 − A3 = 0 and thus we have found scalars c1 , c2 , c3 not all zero such
that c1 A1 + c2 A2 + c3 A3 = 0.
16.2 Bases
We now introduce the important concept of a basis. Given a set of vectors {v1 , . . . , vp−1 , vp }
in V, we showed that W = span{v1 , v2 , . . . , vp } is a subspace of V. If say vp is linearly
dependent on v1 , v2 , . . . , vp−1 then we can remove vp and the smaller set {v1 , . . . , vp−1 } still
spans all of W:
126
Lecture 16
Example 16.5. Show that the standard unit vectors form a basis for V = R3 :
1 0 0
e1 = 0 , e2 = 1 , e3 = 0
0 0 1
c1 e1 + c2 e2 + c3 e3 = 0
127
Linear Independence, Bases, and Dimension
Col(A) = span{v1 , v2 , v3 } = R3 .
−2 1 −3
0 0 0
Theorem 16.10: Let V be a vector space. Then all bases of V have the same number of
vectors.
Proof: We will prove the theorem for the case that V = Rn . We already know that the
standard unit vectors {e1 , e2 , . . . , en } is a basis of Rn . Let {u1 , u2 , . . . , up } be nonzero vec-
tors in Rn and suppose first that p > n. In Lecture 6, Theorem 6.7, we proved that any set
of vectors in Rn containing more than n vectors is automatically linearly dependent. The
reason is that the RREF of A = u1 u2 · · · up will contain at most r = n leading ones,
128
Lecture 16
The previous theorem does not say that every set {v1 , v2 , . . . , vn } of nonzero vectors in
R containing n vectors is automatically a basis for Rn . For example,
n
1 0 2
v1 = 0 , v2 = 1 , v3 = 3
0 0 0
is not in the span of {v1 , v2 , v3 }. All that we can say is that a set of vectors in Rn containing
fewer or more than n vectors is automatically not a basis for Rn . From Theorem 16.10, any
basis in Rn must have exactly n vectors. In fact, on a general abstract vector space V, if
{v1 , v2 , . . . , vn } is a basis for V then any other basis for V must have exactly n vectors also.
Because of this result, we can make the following definition.
Definition 16.11: Let V be a vector space. The dimension of V, denoted dim V, is the
number of vectors in any basis of V. The dimension of the trivial vector space V = {0} is
defined to be zero.
There is one subtle issue we are sweeping under the rug: Does every vector space have a
basis? The answer is yes but we will not prove this result here.
Moving on, suppose that we have a set B = {v1 , v2 , . . . , vn } in Rn containing exactly n
vectors. For B = {v1 , v2 , . . . , vn } to be a basis of Rn , the set B must be linearly independent
and span B = Rn . In fact, it can be shown that if B is linearly independent then the spanning
condition span B = Rn is automatically satisfied, and vice-versa. For example, say the vec-
tors {v1 , v2 , . . . , vn } in Rn are linearly independent, and put A = [v1 v2 · · · vn ]. Then A−1
exists and therefore Ax = b is always solvable. Hence, Col(A) = span {v1 , v2 , . . . , vn } = Rn .
In summary, we have the following theorem.
129
Linear Independence, Bases, and Dimension
det A = −2 6= 0
Hence, rank(A) = 4 and thus the columns of A are linearly independent. Therefore, the
vectors v1 , v2 , v3 , v4 form a basis for R4 .
A subspace W of a vector space V is a vector space in its own right, and therefore also
has dimension. By definition, if B = {v1 , . . . , vk } is a linearly independent set in W and
span{v1 , . . . , vk } = W, then B is a basis for W and in this case the dimension of W is k.
Since an n-dimensional vector space V requires exactly n vectors in any basis, then if W is
a strict subspace of V then
dim W < dim V.
As an example, in V = R3 subspaces can be classified by dimension:
1. The zero dimensional subspace in R3 is W = {0}.
2. The one dimensional subspaces in R3 are lines through the origin. These are spanned
by a single non-zero vector.
3. The two dimensional subspaces in R3 are planes through the origin. These are spanned
by two linearly independent vectors.
4. The only three dimensional subspace in R3 is R3 itself. Any set {v1 , v2 , v3 } in R3 that
is linearly independent is a basis for R3 .
Example 16.14. Find a basis for Null(A) and the dim Null(A) if
−2 4 −2 −4
A= 2 −6 −3 1 .
−3 8 2 −3
Solution. By definition, the Null(A) is the solution set of the homogeneous system Ax = 0.
Row reducing we obtain
1 0 6 5
A∼ 0 1 5/2 3/2
0 0 0 0
130
Lecture 16
1 0
span the null space (A) and they are linearly independent. Therefore, B = {v1 , v2 } is a
basis for Null(A) and therefore dim Null(A) = 2. In general, the dimension of the Null(A)
is the number of free parameters in the solution set of the system Ax = 0, that is,
Example 16.15. Find a basis for Col(A) and the dim Col(A) if
1 2 3 −4 8
1 2 0 2 8
A=
2 4 −3
.
10 9
3 6 0 6 9
Solution. By definition, the column space of A is the span of the columns of A, which we
denote by A = [v1 v2 v3 v4 v5 ]. Thus, to find a basis for Col(A), by trial and error we could
determine the largest subset of the columns of A that are linearly independent. For example,
first we determine if {v1 , v2 } is linearly independent. If yes, then add v3 and determine if
{v1 , v2 , v3 } is linearly independent. If {v1 , v2 } is not linearly independent then discard v2
and determine if {v1 , v3 } is linearly independent. We continue this process until we have
determined the largest subset of the columns of A that is linearly independent, and this will
yield a basis for Col(A). Instead, we can use the fact that matrices that are row equivalent
induce the same solution set for the associated homogeneous system. Hence, let B be the
RREF of A:
1 2 0 2 0
0 0 1 −2 0
B = rref(A) =
0 0 0 0 1
0 0 0 0 0
131
Linear Independence, Bases, and Dimension
By inspection, v2 = 2v1 and v4 = 2v1 − 2v3 . Thus, because b1 , b3 , b5 are linearly inde-
pendent columns of B =rref(A), then v1 , v3 , v5 are linearly independent columns of A.
Therefore, we have
1 3 8
1 0 8
Col(A) = span{v1 , v3 , v5 } = span ,
, 9
2 −3
3 0 9
and consequently dim Col(A) = 3. This procedure works in general: To find a basis
for the Col(A), row reduce A ∼ B until you can determine which columns of B are linearly
independent. The columns of A in the same position as the linearly independent columns
of B form a basis for the Col(A).
WARNING: Do not take the linearly independent columns of B as a basis for Col(A).
Always go back to the original matrix A to select the columns.
After this lecture you should know the following:
• how to find a basis for the null space and column space of a matrix A
132
Lecture 17
Definition 17.1: The rank of a matrix A is the dimension of its column space. We will
use rank(A) to denote the rank of A.
Recall that Col(A) = Range(TA ), and thus the rank of A is the dimension of the range of
the linear mapping TA . The range of a mapping is sometimes called the image.
Definition 17.2: The nullity of a matrix A is the dimension of its nullspace Null(A).
We will use nullity(A) to denote the nullity of A.
Recall that (A) = ker(TA ), and thus the nullity of A is the dimension of the kernel of the
linear mapping TA .
The rank and nullity of a matrix are connected via the following fundamental theorem
known as the Rank Theorem.
n = rank(A) + nullity(A).
Proof. A basis for the column space is obtained by computing rref(A) and identifying the
columns that contain a leading 1. Each column of A corresponding to a column of rref(A)
with a leading 1 is a basis vector for the column space of A. Therefore, if r is the number
of leading 1’s then r = rank(A). Now let d = n − r. The number of free parameters in the
The Rank Theorem
solution set of Ax = 0 is d and therefore a basis for Null(A) will contain d vectors, that is,
nullity(A) = d. Therefore,
nullity(A) = n − rank(A).
(ii) Col(A) = Rn
(iii) rank(A) = n
134
Lecture 17
(v) nullity(A) = 0
135
The Rank Theorem
136
Lecture 18
Coordinate Systems
18.1 Coordinates
Recall that a basis of a vector space V is a set of vectors B = {v1 , v2 , . . . , vn } in V such that
1. the set B spans all of V, that is, V = span(B), and
x∗ = c1 v1 + c2 v2 + · · · + cn vn .
Moreover, from the definition of linear independence given in Definition 6.1, any vector
x ∈ span(B) can be written in only one way as a linear combination of v1 , . . . , vn . In other
words, for the x∗ above, there does not exist other scalars t1 , . . . , tn such that also
x∗ = t1 v1 + t2 v2 + · · · + tn vn .
To see this, suppose that we can write x∗ in two different ways using B:
x∗ = c1 v1 + c2 v2 + · · · + cn vn
x∗ = t1 v1 + t2 v2 + · · · + tn vn .
Then
0 = x∗ − x∗ = (c1 − t1 )v1 + (c2 − t2 )v2 + · · · + (cn − tn )vn .
Since B = {v1 , . . . , vn } is linearly independent, the only linear combination of v1 , . . . , vn
that gives the zero vector 0 is the trivial linear combination. Therefore, it must be the case
that ci − ti = 0, or equivalently that ci = ti for all i = 1, 2 . . . , n. Thus, there is only one way
to write x∗ in terms of B = {v1 , . . . , vn }. Hence, relative to the basis B = {v1 , v2 , . . . , vn },
the scalars c1 , c2 , . . . , cn uniquely determine the vector x, and vice-versa.
Our preceding discussion on the unique representation property of vectors in a given basis
leads to the following definition.
Coordinate Systems
Definition 18.1: Let B = {v1 , . . . , vn } be a basis for V and let x ∈ V. The coordinates
of x relative to the basis B are the unique scalars c1 , c2 , . . . , cn such that
x = c1 v1 + c2 v2 + · · · + cn vn .
The notation [x]B indicates that these are coordinates of x with respect to the basis B.
If it is clear what basis we are working with, we will omit the subscript B and simply write
[x] for the coordinates of x relative to B.
Solution. Let v1 = (1, 1) and let v2 = (−1, 1). By definition, the coordinates of v with
respect to B are the scalars c1 , c2 such that
1 −1 c1
v = c1 v1 + c2 v2 =
1 1 c2
If we put P = [v1 v2 ], and let [v]B = (c1 , c2 ), then we need to solve the linear system
v = P[v]B
Solving the linear system, one finds that the solution is [v]B = (2, −1), and therefore this is
the B-coordinate vector of v, or the coordinates of v, relative to B.
It is clear how the procedure of the previous example can be generalized. Let B =
n n
{v1 , v2 , . . . , vn } be a basis for R and let v be any vector in R . Put P = v1 v2 · · · vn .
Then the B-coordinates of v is the unique column vector [v]B solving the linear system
Px = v
138
Lecture 18
[v]B = P−1 v.
We
remark
that if an inconsistent row arises when you row reduce the augmented matrix
P v then you have made an error in your row reduction algorithm. In summary, to find
coordinates with respect to a basis B in Rn , we need to solve a square linear system.
Solution. Clearly,
3 1 0 0
v = 11 = 3 0 + 11 1 − 7 0
−7 0 0 1
139
Coordinate Systems
Example 18.5. Let P3 [t] be the vector space of polynomials of degree at most 3.
(i) Show that B = {1, t, t2 , t3 } is a basis for P3 [t].
c0 + c1 t + c2 t2 + c3 t3 = 0.
Since the above equality must hold for all values of t, we conclude that c0 = c1 = c2 = c3 = 0.
Therefore, B is linearly independent, and consequently a basis for P3 [t]. In the basis B, the
coordinates of v(t) = 3 − t2 − 7t3 are
3
0
[v(t)]B =
−1
−7
If
1 0 0 1 0 0 0 0 c1 c2 0 0
c1 + c2 + c3 + c4 = =
0 0 0 0 1 0 0 1 c3 c4 0 0
140
Lecture 18
−1
The basis B above is the standard basis of M2×2 .
x = P[x]B . (⋆)
P−1 x = [x]B .
Therefore, P−1 maps coordinate vectors in the standard basis to coordinates relative to B.
141
Coordinate Systems
On the other hand, the inverse matrix P−1 maps standard coordinates in R3 to B-coordinates.
One can verify that
4 3 6
P−1 = −1 −1 −1
0 0 −1
Therefore, the B coordinates of v are
4 3 6 2 5
−1
[v]B = P v = −1 −1 −1
−1 = −1
0 0 −1 0 0
When V is an abstract vector space, e.g. Pn [t] or Mn×n , the notion of a coordinate
mapping is similar as the case when V = Rn . If V is an n-dimensional vector space and
B = {v1 , v2 , . . . , vn } is a basis for V, we define the coordinate mapping P : V → Rn relative
to B as the mapping
P(v) = [v]B .
Example 18.8. Let V = M2×2 and let B = {A1 , A2 , A3 , A4} be the standard basis for
M2×2 . What is P : M2×2 → R4 ?
Solution. Recall,
1 0 0 1 0 0 0 0
B = {A1 , A2 , A3 , A4} = , , ,
0 0 0 0 1 0 0 1
a11 a12
Then for any A = we have
a21 a22
a11
a11 a12 a12
P a21 .
=
a21 a22
a22
v = c1 v1 + c2 v2 + · · · + cn vn
142
Lecture 18
and thus [v]B = (c1 , c2 , . . . , cn ) are the coordinates of v in the basis B By linearity of the
mapping T we have
T(v) = T(c1 v1 + c2 v2 + · · · + cn vn )
Now each vector T(vj ) is in W and therefore because γ is a basis of W there are scalars
a1,j , a2,j , . . . , am,j such that
In other words,
[T(vj )]γ = (a1,j , a2,j , . . . , am,j )
Substituting T(vj ) = a1,j w1 + a2,j w2 + · · · + am,j wm for each j = 1, 2, . . . , n into
Therefore,
[T(v)]γ = A[v]B
where A is the m × n matrix given by
A = [T(v1 )]γ [T(v2 )]γ · · · [T(vn )]γ
The matrix A is the matrix representation of the linear mapping T in the bases B and γ.
Example 18.9. Consider the vector space V = P2 [t] of polynomial of degree no more than
two and let T : V → V be defined by
143
Coordinate Systems
c1 v1 + c2 v2 + c3 v3 = 0
(b) The coordinates of v(t) = −t2 + 3t + 1 are the unique scalars (c1 , c2 , c3 ) such that
c1 v1 + c2 v2 + c3 v3 = v
And therefore
−18/5 −6/5 24/5
A = 4/5 −2/5 8/5
0 0 −2
144
Lecture 18
145
Coordinate Systems
146
Lecture 19
Change of Basis
PB = [v1 v2 · · · vn ].
x = PB [x]B .
The components of the vector x are the coordinates of x in the standard basis E = {e1 , . . . , en }.
In other words,
[x]E = x.
Therefore,
[x]E = PB [x]B .
We can therefore interpret PB as the matrix mapping that maps the B-coordinates of x to
the E-coordinates of x. To make this more explicit, we sometimes use the notation
E PB
[x]E = (E PB )[x]B .
Hence, the matrix (E PB )−1 maps standard coordinates to B-coordinates, see Figure 19.1. It
is natural then to introduce the notation
B PE = (E PB )−1
Change of Basis
V = Rn
b
x
B PE = (E PB )−1
[x]B
1 −3 3 −8
v1 = 0 , v2 = 4 , v2 = −6 , x = 2 .
0 0 3 3
(a) Show that the set of vectors B = {v1 , v2 , v3 } forms a basis for Rn .
(b) Find the change-of-coordinates matrix from B to standard coordinates.
(c) Find the coordinate vector [x]B for the given x.
Solution. Let
1 −3 3
PB = 0 4 −6
0 0 3
It is clear that det(PB ) = 12, and therefore v1 , v2 , v3 are linearly independent. Therefore,
B is a basis for Rn . The matrix PB takes B-coordinates to standard coordinates. The
B-coordinate vector [x]B = (c1 , c2 , c3 ) is the unique solution to the linear system
x = PB [x]B
[x]B = (−5, 2, 1)
We verify that [x]B = (−5, 2, 1) are indeed the coordinates of x = (−8, 2, 3) in the basis
148
Lecture 19
B = {v1 , v2 , v3 }:
1 −3 3
(−5)v1 + (2)v2 + (1)v3 = −5 0 + 2 4 + −6
0 0 3
−5 −6 3
= 0 + 8 + −6
0 0 3
−8
= 2
3
| {z }
x
E PB
takes as input the B-coordinates [x]B of a vector x and returns the coordinates of x in the
standard basis. We now consider the situation of dealing with two basis B and C where
neither is assumed to be the standard basis E. Hence let B = {v1 , v2 , . . . , vn } and let
C = {w1 , . . . , wn } be two basis of Rn and let
E PB = [v1 v2 · · · vn ]
E PC = [w1 w2 · · · wn ].
Then if [x]C is the coordinate vector of x in the basis C then
x = (E PC )[x]C .
x = (E PC )[x]C .
Then
(E PC )[x]C = (E PB )[x]B
and because E PC is invertible we have that
149
Change of Basis
Hence, the matrix (E PC )−1 (E PB ) maps the B-coordinates of x to the C-coordinates of x. For
this reason, it is natural to use the notation (see Figure 19.2)
C PB = (E PC )−1 (E PB ).
V = Rn
b
x
E PC E PB
b b
[x]C C PB [x]B
(E PC )−1 vi ,
C PB = (E PC )−1 (E PB )
150
Lecture 19
E PB [x]B =x
Row reducing the augmented matrix [E PB x] we obtain
2
[x]B =
1
Next, to find [x]C we can solve the linear system
E PC [x]C =x
Alternatively, since we now know [x]B and C PB has been computed, to find [x]C we simply
multiply C PB by [x]B :
" #" # " #
2 −3/2 2 5/2
[x]C = C PB [x]B = =
−3 5/2 1 −7/2
5/2 0
Let’s verify that [x]C = are indeed the C-coordinates of x = :
−7/2 −2
" #" # " #
−7 −5 5/2 0
P
E C [x] C = = .
9 7 −7/2 −2
151
Change of Basis
152
Lecture 20
Lecture 20
u • v = u1 v1 + u2 v2 + · · · + un vn .
Notice that the inner product u • v can be computed as a matrix multiplication as follows:
v1
v2
u • v = uT v = u1 u2 · · · un .. .
.
vn
The following theorem summarizes the basic algebraic properties of the inner product.
(a) u • v = v • u
(b) (u + v) • w = u • w + v • w
153
Inner Products and Orthogonality
Example 20.3. Let u = (2, −5, −1) and let v = (3, 2, −3). Compute u • v, v • u, u • u, and
v • v.
Solution. By definition:
kuk = 1.
kαuk = |α|kuk.
Proof. We have
p
kαuk = (αu) • (αu)
p
= α2 (u • u)
√
= |α| u • u
= |α|kuk.
By Theorem 20.5, any non-zero vector u ∈ Rn can be scaled to obtain a new unit vector
in the same direction as u. Indeed, suppose that u is non-zero so that kuk =
6 0. Define the
new vector
1
v= u
kuk
154
Lecture 20
1
Notice that α = kuk
is just a scalar and thus v is a scalar multiple of u. Then by Theorem 20.5
we have that
1
kvk = kαuk = |α| · kuk = · kuk = 1
kuk
and therefore v is a unit vector, see Figure 20.1. The process of taking a non-zero vector u
1
and creating the new vector v = kuk u is sometimes called normalization of u.
u
1
v= kuk
u
Example 20.6. Let u = (2, 3, 6). Compute kuk and find the unit vector v in the same
direction as u.
Solution. By definition,
√ √ √
kuk = u•u= 22 + 32 + 62 = 49 = 7.
Then the unit vector that is in the same direction as u is
2 2/7
1 1
v= u = 3 = 3/7
kuk 7
6 6/7
Verify that kvk = 1:
p p p √
kvk = (2/7)2 + (3/7)2 + (6/7)2 = 4/49 + 9/49 + 36/49 = 49/49 = 1 = 1.
Now that we have the definition of the length of a vector, we can define the notion of
distance between two vectors.
Definition 20.7: Let u and v be vectors in Rn . The distance between u and v is the
length of the vector u − v. We will denote the distance between u and v by d(u, v). In
other words,
d(u, v) = ku − vk.
3 7
Example 20.8. Find the distance between u = and v = .
−2 −9
Solution. We compute:
p √
d(u, v) = ku − vk = (3 − 7)2 + (−2 + 9)2 = 65.
155
Inner Products and Orthogonality
20.2 Orthogonality
In the context of vectors in R2 and R3 , orthogonality is synonymous with perpendicularity.
Below is the general definition.
In R2 and R3 , the notion of orthogonality should be familiar to you. In fact, using the
Law of Cosines in R2 or R3 , one can prove that
The general notion of orthogonality in Rn leads to the following theorem from grade
school.
Theorem 20.10: (Pythagorean Theorem) Two vectors u and v are orthogonal if and
only if ku + vk2 = kuk2 + kvk2 .
p
Solution. First recall that ku + vk = (u + v) • (u + v) and therefore
ku + vk2 = (u + v) • (u + v)
=u•u+u•v+v•u+v•v
In the following theorem we prove that orthogonal sets are linearly independent.
156
Lecture 20
c1 u1 + c2 u2 + · · · + cp up = 0.
Take the inner product of u1 with both sides of the above equation:
Since the set is orthogonal, the left-hand side of the last equation simplifies to c1 (u1 • u1 ).
The right-hand side simplifies to 0. Hence,
c1 (u1 • u1 ) = 0.
But u1 • u1 = ku1 k2 is not zero and therefore the only way that c1 (u1 • u2 ) = 0 is if c1 = 0.
Repeat the above steps using u2 , u3 , . . . , up and conclude that c2 = 0, c3 = 0, . . . , cp =
0. Therefore, {u1 , . . . , up } is linearly independent. If p = n, then the set {u1 , . . . , up } is
automatically a basis for Rn .
Example 20.13. Is the set {u1 , u2 , u3 } an orthogonal set?
1 0 −5
u1 = −2 , u2 = 1 , u3 = −2
1 2 1
Solution. Compute
157
Inner Products and Orthogonality
158
Lecture 20
Hence, computing coordinates with respect to an orthonormal basis can be done without
performing any row operations and all we need to do is compute inner products! We make
the important observation that an alternate expression for [x]B is
u1 • x uT1
u2 • x uT
2
[x]B = .. = .. x = UT x
. .
un • x uTn
where U = [u1 u2 · · · un ]. On the other hand, recall that by definition [x]B satisfies
U[x]B = x, and therefore [x]B = U−1 x. If we compare the two identities
we suspect then that U−1 = UT . This is indeed the case. To see this, let B = {u1 , u2 , . . . , un }
be an orthonormal basis for Rn and put
U = [u1 u2 · · · un ].
= In .
159
Inner Products and Orthogonality
Therefore,
U−1 = UT .
A matrix U ∈ Rn×n such that
UT U = UUT = In
is called a orthogonal matrix. Hence, if B = {u1 , u2 , . . . , un } is an orthonormal set then
the matrix
U = u1 u2 · · · un
is an orthogonal matrix.
Then B = {u1 , u2 , u3 } is now an orthonormal set and thus since B consists of three
vectors then B is an orthonormal basis of R3 .
(c) Finally, computing coordinates in an orthonormal basis is easy:
u1 • x 0
√
[x]B = u2 • x = 2/ 18
u3 • x 5/3
160
Lecture 20
x1 = x • e1
x2 = x • e2
x3 = x • e3
(u1 + u2 ) • w = u1 • w + u2 • w = 0 + 0 = 0.
u2 + u3 = 0
u1 − u3 = 0
This is a linear system for the unknowns u1 , u2, u3 , u4. The general solution to the linear
system is
1 0
0 1
1 + s −1 .
u = t
0 0
Therefore, a basis for W⊥ is {(1, 0, 1, 0), (0, 1, −1, 0)}.
161
Inner Products and Orthogonality
162
Lecture 21
Lecture 21
Example 21.2. Determine if the given vectors v and u are eigenvectors of A? If yes, find
the eigenvalue of A associated to the eigenvector.
4 −1 6 −3 −1
A = 2 1 6 , v = 0 , u = 2 .
2 −1 8 1 1
Solution. Compute
4 −1 6 −3 −6
Av = 2 1 6 0 = 0
2 −1 8 1 2
−3
=2 0
1
= 2v
163
Eigenvalues and Eigenvectors
Solution. We compute
0
Av = 0 = 0.
0
Hence, if λ = 0 then λv = 0 and thus Av = λv. Therefore, v is an eigenvector of A with
corresponding eigenvalue λ = 0.
How does one find the eigenvectors/eigenvalues of a matrix A? The general procedure
is to first find the eigenvalues of A and then for each eigenvalue find the corresponding
eigenvectors. In this section, however, we will instead suppose that we have already found
the eigenvalues of A and concern ourselves with finding the associated eigenvectors. Suppose
then that λ is known to be an eigenvalue of A. How do we find an eigenvector v corresponding
to the eigenvalue λ? To answer this question, we note that if v is to be an eigenvector of A
with eigenvalue λ then v must satisfy the equation
Av = λv.
(A − λI)v = 0.
The last equation says that if v is to be an eigenvector of A with eigenvalue λ then v must
be in the null space of A − λI:
v ∈ Null(A − λI).
164
Lecture 21
(A − λI)x = 0.
Recall that the null space of any matrix is a subspace and for this reason we call the subspace
Null(A − λI) the eigenspace of A corresponding to λ.
and {v} is a basis for the eigenspace. The vector v is of course an eigenvector of A with
eigenvalue λ = 4 and also (of course) any multiple of v is also eigenvector of A with λ = 4.
165
Eigenvalues and Eigenvectors
can be written as
1 1
x = t1 0 + t2 2
1 0
Therefore, the eigenspace of A corresponding to λ = 3 is
1 1
Null(A − 3I) = span{v1 , v2 } = span 0 , 2 .
1 0
The vectors v1 and v2 are two linearly independent eigenvectors of A with eigenvalue λ = 3.
Therefore {v1 , v2 } is a basis for the eigenspace of A with eigenvalue λ = 3. You can verify
that Av1 = 3v1 and Av2 = 3v2 .
As shown in the last example, there may exist more than one linearly independent eigen-
vector of A corresponding to the same eigenvalue, in other words, it is possible that the
dimension of the eigenspace Null(A − λI) is greater than one. What can be said about the
eigenvectors of A corresponding to different eigenvalues?
166
Lecture 21
vp+1 = c1 v1 + c2 v2 + · · · + cp vp . (21.1)
λp+1vp+1 = c1 λ1 v1 + c2 λ2 v2 + · · · + cp λp vp . (21.2)
Now {v1 , . . . , vp } is linearly independent and thus ci (λi − λp+1 ) = 0. But the eigenvalues
{λ1 , . . . , λk } are all distinct and so we must have c1 = c2 = · · · = cp = 0. But from (21.1)
this implies that vp+1 = 0, which is a contradiction because eigenvectors are by definition
non-zero. This proves that {v1 , v2 , . . . , vk } is a linearly independent set.
Find bases for the eigenspaces corresponding to λ1 and λ2 and show that any two vectors
from these distinct eigenspaces are linearly independent.
Solution. Compute
−5 6 3
A − λ1 I = 1 6 9
8 −6 0
and one finds that
−3
(A − λ1 I) = span −4
3
167
Eigenvalues and Eigenvectors
The last matrix has rank r = 2, and thus v1 , v2 are indeed linearly independent.
Av = 0 · v = 0.
Theorem 21.8: The matrix A ∈ Rn×n is invertible if and only if λ = 0 is not an eigenvalue
of A.
In fact, later we will see that det(A) is the product of its eigenvalues.
168
Lecture 22
Lecture 22
Thus, if A is 2 × 2 then
169
The Characteristic Polynomial
In summary, to find the eigenvalues of A we must find the roots of the characteristic poly-
nomial:
p(λ) = det(A − λI).
The following theorem asserts that what we observed for the case n = 2 is indeed true for
all n.
Therefore, the claim holds for n = 2. By induction, suppose that the claims hold for n ≥ 2.
If A is a (n + 1) × (n + 1) matrix then expanding det(A − λI) along the first row:
n
X
det(A − λI) = (a11 − λ) det(A11 − λI) + (−1)1+k a1k det(A1k − λI).
k=2
By induction, each of det(A1k −λI) is a nth degree polynomial. Hence, (a11 −λ) det(A11 −λI)
is a (n + 1)th degree polynomial. This ends the proof.
Solution. Compute
−2 4 λ 0 −2 − λ 4
A − λI = − = .
−6 8 0 λ −6 8−λ
Therefore,
The roots of p(λ) are clearly λ1 = 4 and λ2 = 2. Therefore, the eigenvalues of A are λ1 = 4
and λ2 = 2.
170
Lecture 22
λ1 = 2, λ2 = 3, λ3 = −1.
Now that we know how to find eigenvalues, we can combine our work from the previous
lecture to find both the eigenvalues and eigenvectors of a given matrix A.
Example 22.5. For each eigenvalue of A from Example 22.4, find a basis for the corre-
sponding eigenspace.
Solution. Start with λ1 = 2:
−6 −6 −7
A − 2I = 3 3 3
0 0 1
After basic row reduction and back substitution, one finds that the null space of A − 2I is
spanned by
−1
v1 = 1 .
0
171
The Characteristic Polynomial
and therefore v3 is an eigenvector of A with eigenvalue λ3 . Notice that in this case, the 3 × 3
matrix A has three distinct eigenvalues and the eigenvectors
−1 −1 −2
{v1 , v2 , v3 } = 1 , 0 , 1
0 1 0
172
Lecture 22
basis for Rn of eigenvectors of A? In some cases, the answer is yes as the next example
demonstrates.
Example 22.7. Find the eigenvalues of A and a basis for each eigenspace.
2 0 0
A= 4 2 2
−2 0 1
and therefore the eigenvalues are λ1 = 1 and λ2 = 2. Notice that although p(λ) is a
polynomial of degree n = 3, it has only two distinct roots and hence A has only two
distinct eigenvalues. The eigenvalue λ2 = 2 is said to be repeated and λ1 = 1 is said to be
a simple eigenvalue. For λ1 = 1 one finds that the eigenspace Null(A − λ1 I) is spanned by
0
v1 = −2
1
and thus v1 is an eigenvector of A with eigenvalue λ1 = 1. Now consider λ2 = 2:
0 0 0
A − 2I = 4 0 2
−2 0 −1
Row reducing A − 2I one obtains
0 0 0 −2 0 −1
A − 2I = 4 0 2 ∼ 0 0 0 .
−2 0 −1 0 0 0
Therefore, rank(A − 2I) = 1 and thus by the Rank Theorem it follows that Null(A − 2I) is
a 2-dimensional eigenspace. Performing back substitution, one finds the following basis for
the λ2 -eigenspace:
−1 0
{v2 , v3 } = 0 , 1
2 0
Therefore, the eigenvectors
0 −1 0
{v1 , v2 , v3 } = −2 , 0 , 1
1 2 0
form a basis for R3 . Hence, for the repeated eigenvalue λ2 = 2 we were able to find two
linearly independent eigenvectors.
173
The Characteristic Polynomial
Before moving further with more examples, we need to introduce some notation regard-
ing the factorization of the characteristic polynomial. In the previous Example 22.7, the
characteristic polynomial was factored as p(λ) = (λ − 1)(λ − 2)2 and we found a basis for
R3 of eigenvectors despite the presence of a repeated eigenvalue. In general, if p(λ) is an
nth degree polynomial that can be completely factored into linear terms, then p(λ) can be
written in the form
p(λ) = (λ − λ1 )k1 (λ − λ2 )k2 · · · (λ − λp )kp
where k1 , k2 , . . . , kp are positive integers and the roots of p(λ) are then λ1 , λ2 , . . . , λk . Because
p(λ) is of degree n, we must have that k1 + k2 + · · ·+ kp = n. Motivated by this, we introduce
the following definition.
Definition 22.8: Suppose that A ∈ Mn×n has characteristic polynomial p(λ) that can be
factored as
p(λ) = (λ − λ1 )k1 (λ − λ2 )k2 · · · (λ − λp )kp .
The exponent ki is called the algebraic multiplicity of the eigenvalue λi . The dimension
Null(A − λi I) of the eigenspace associated to λi is called the geometric multiplicity of
λi .
For simplicity and whenever it is convenient, we will denote the geometric multiplicity of the
eigenvalue λi as
gi = dim(Null(A − λi I)).
In Example 22.7, we had p(λ) = (λ−1)(λ−2)2 and thus λ1 = 1 has algebraic multiplicity
k1 = 1 and λ2 = 2 has algebraic multiplicity k2 = 2. For λ1 = 1, we found one linearly
independent eigenvector, and therefore λ1 has geometric multiplicity g1 = 1. For λ1 = 2,
we found two linearly independent eigenvectors, and therefore λ2 has geometric multiplicity
g2 = 2. However, as we will see in the next example, the geometric multiplicity gi is in
general less than the algebraic multiplicity ki :
g i ≤ ki
174
Lecture 22
Example 22.10. Find the eigenvalues of A and a basis for each eigenspace:
2 4 3
A = −4 −6
−3
3 3 1
For each eigenvalue of A, find its algebraic and geometric multiplicity. Does R3 have a basis
of eigenvectors of A?
Solution. One computes
p(λ) = −λ3 − 3λ2 + 4 = −(λ − 1)(λ + 2)2
and therefore the eigenvalues of A are λ1 = 1 and λ2 = −2. The algebraic multiplicity of λ1
is k1 = 1 and that of λ2 is k2 = 2. For λ1 = 1 we compute
1 4 3
A − I = −4 −7 −3
3 3 0
and then one finds that
1
v1 = −1
1
is a basis for the λ1 -eigenspace. Therefore, the geometric multiplicity of λ1 is g1 =. For
λ2 = −2 we compute
4 4 3 4 4 3 1 1 1
A − λ2 I = −4 −4 −3 ∼ 1 1 1 ∼ 0 0 1
3 3 3 0 0 0 0 0 0
Therefore, since rank(A − λ2 I) = 2, the geometric multiplicity of λ2 = −2 is g2 = 1, which
is less than the algebraic multiplicity k2 = 2. An eigenvector corresponding to λ2 = −2 is
−1
v2 = 1
0
Therefore, for the repeated eigenvalue λ2 = −2, we are able to find only one linearly inde-
pendent eigenvector. Therefore, it is not possible to construct a basis for R3 consisting of
eigenvectors of A.
Hence, in the previous example, there does not exist a basis of R3 of eigenvectors of A
because for one of the eigenvalues (namely λ2 ) the geometric multiplicity was less than the
algebraic multiplicity:
g2 < d2 .
In the next lecture, we will elaborate on this situation further.
Example 22.11. Find the algebraic and geometric multiplicities of each eigenvalue of the
matrix
−7 1 0
A = 0 −7 1 .
0 0 −7
175
The Characteristic Polynomial
A = PBP−1.
Theorem 22.13: If A and B are similar matrices then the following are true:
(a) rank(A) = rank(B)
(b) det(A) = det(B)
(c) A and B have the same eigenvalues
Proof. We will prove part (c). If A and B are similar then A = PAP−1 for some matrix P.
Then
det(A − λI) = det(A − λPP−1 )
= det(PBP−1 − λPP−1)
= det(P(B − λI)P−1)
= det(B − λI)
176
Lecture 22
Thus, A and B have the same characteristic polynomial, and hence the same eigenvalues.
In the next lecture, we will see that if Rn has a basis of eigenvectors of A then A is similar
to a diagonal matrix.
177
The Characteristic Polynomial
178
Lecture 23
Diagonalization
Theorem 23.1: Let A be a triangular matrix (either upper or lower). Then the eigen-
values of A are its diagonal entries.
Proof. We will prove the theorem for the case n = 3 and A is upper triangular; the general
case is similar. Suppose then that A is a 3 × 3 upper triangular matrix:
a11 a12 a13
A = 0 a22 a23
0 0 a33
Then
a11 − λ a12 a13
A − λI = 0 a22 − λ a23 .
0 0 a33 − λ
and thus the characteristic polynomial of A is
p(λ) = det(A − λI) = (a11 − λ)(a22 − λ)(a33 − λ)
and the roots of p(λ) are
λ1 = a11 , λ2 = a22 , λ3 = a33 .
In other words, the eigenvalues of A are simply the diagonal entries of A.
We now introduce a very special type of a triangular matrix, namely, a diagonal matrix.
Definition 23.3: A matrix D whose off-diagonal entries are all zero is called a diagonal
matrix.
A diagonal matrix is clearly also a triangular matrix and therefore the eigenvalues of a
diagonal matrix D are simply the diagonal entries of D. Moreover,
the powers of a diagonal
λ 0
matrix are easy to compute. For example, if D = 1 then
0 λ2
2
2 λ1 0 λ1 0 λ1 0
D = =
0 λ2 0 λ2 0 λ22
23.2 Diagonalization
Recall that two matrices A and B are said to be similar if there exists an invertible matrix
P such that
A = PBP−1.
A very simple type of matrix is a diagonal matrix since many computations with diagonal
matrices are trivial. The problem of diagonalization is thus concerned with answering the
question of whether a given matrix is similar to a diagonal matrix. Below is the formal
definition.
180
Lecture 23
A = PDP−1 .
How do we determine when a given matrix A is diagonalizable? Let us first determine what
conditions need to be met for a matrix A to be diagonalizable. Suppose then
that A is diag-
onalizable. Then by Definition 23.4, there exists an invertible matrix P = v1 v2 · · · vn
and a diagonal matrix
λ1 0 . . . 0
0 λ2 . . . 0
D = .. .. . . ..
. . . .
0 0 . . . λn
such that A = PDP−1. Multiplying on the right both sides of the equation A = PDP−1
by the matrix P we obtain that
AP = PD.
Now
AP = Av1 Av2 · · · Avn
while on the other hand
PD = λ1 v1 λ2 v2 · · · λn vn .
Therefore, since it holds that AP = PD then
Av1 Av2 · · · Avn = λ1 v1 λ2 v2 · · · λn vn .
Avi = λi vi .
Thus, the columns v1 , v2 , . . . , vn of P are eigenvectors of A and form a basis for Rn because
P is invertible. In conclusion, if A is diagonalizable then Rn has a basis consisting of
eigenvectors of A.
Suppose instead that {v1 , v2 , . . . , vn } is a basis of Rn consisting of eigenvectors of A. Let
λ1 , λ2 , . . . , λn be the eigenvalues of A associated to v1 , v2 , . . . , vn , respectively, and set
P = v1 v2 · · · vn .
181
Diagonalization
The punchline with Theorem 23.5 is that the problem of diagonalization of a matrix A
is equivalent to finding a basis of Rn consisting of eigenvectors of A. We will see in some of
the examples below that it is not always possible to diagonalize a matrix.
182
Lecture 23
Theorem 23.7: A matrix A is diagonalizable if and only if the algebraic and geometric
multiplicities of each eigenvalue are equal.
183
Diagonalization
184
Lecture 23
Example 23.11. Suppose that A has eigenvector v with corresponding eigenvalue λ. Show
that if A is invertible then v is an eigenvector of A−1 with corresponding eigenvalue λ1 .
Example 23.12. Suppose that A and B are n × n matrices such that AB = BA. Show
that if v is an eigenvector of A with corresponding eigenvalue then v is also an eigenvector
of B with corresponding eigenvalue λ.
185
Diagonalization
186
Lecture 24
Lecture 24
Diagonalization of Symmetric
Matrices
The Second Derivative Test of multivariable calculus then says that if P = (a1 , a2 , . . . , an )
is a critical point of f , that is
∂f ∂f ∂f
(P ) = (P ) = · · · = (P ) = 0
∂x1 ∂x2 ∂xn
then
(i) P is a local minimum point of f if the matrix Hess(f ) has all positive eigenvalues,
(ii) P is a local maximum point of f if the matrix Hess(f ) has all negative eigenvalues,
and
187
Diagonalization of Symmetric Matrices
(iii) P is a saddle point of f if the matrix Hess(f ) has negative and positive eigenvalues.
In general, the eigenvalues of a matrix with real entries can be complex numbers. For
example, the matrix
0 −1
A=
1 0
has characteristic polynomial
p(λ) = λ2 + 1
√ √
the roots of which are clearly λ1 = −1 = i and λ2 = − −1 = −i. Thus, in general,
a matrix whose entries are all real numbers may have complex eigenvalues. However, for
symmetric matrices we have the following.
Theorem 24.1: If A is a symmetric matrix then all of its eigenvalues are real numbers.
λ1 v1T v2 = (λ1 v1 )T v2
= (Av1 )T v2
= v1T AT v2
= v1T Av2
= v1T (λ2 v2 )
= λ2 v1T v2 .
Therefore, λ1 v1T v2 = λ2 v1T v2 which implies that (λ1 − λ2 )v1T v2 = 0. But since (λ1 − λ2 ) 6= 0
then we must have v1T v2 = 0, that is, v1 and v2 are orthogonal.
188
Lecture 24
all matrices are diagonalizable. As it turns out, any symmetric A is diagonalizable and
moreover (and perhaps more importantly) there exists an orthogonal eigenvector matrix P
that diagonalizes A. The full statement is below.
The proof of the theorem is not hard but we will omit it. The punchline of Theorem 24.3
is that, for the case of a symmetric matrix, we will never encounter the situation where
the geometric multiplicity is strictly less than the algebraic multiplicity. Moreover, we are
guaranteed to find an orthogonal matrix that diagonalizes a given symmetric matrix.
Example 24.4. Find an orthogonal matrix P that diagonalizes the symmetric matrix
1 0 −1
A= 0 1 1 .
−1 1 2
Solution. The characteristic polynomial of A is
p(λ) = det(A − λI) = λ3 − 4λ2 + 3λ = λ(λ − 1)(λ − 3)
The eigenvalues of A are λ1 = 0, λ2 = 1 and λ3 = 3. Eigenvectors of A associated to
λ1 , λ2 , λ3 are
1 1 −1
u1 = −1 , u2 = 1 , u3 = 1 .
1 0 2
As expected by Theorem 24.2, the eigenvectors u1 , u2 , u3 form an orthogonal set:
uT1 u2 = 0, uT1 u3 = 0, uT2 u3 = 0.
To find an orthogonal matrix P that diagonalizes A we must normalize the eigenvectors
u1 , u2 , u3 to obtain an orthonormal basis {v1 , v2 , v3 }. To that end, first compute uT1 u1 = 3,
uT2 u2 = 2, and uT3 u3 = 6. Then let v1 = √13 u1 , let v2 = √12 u2 , and let v3 = √16 u3 . Therefore,
an orthogonal matrix that diagonalizes A is
1
√ √1 − √16
3 2
− √1 √1 1
P = v1 v2 v3 = 3 2
√
6
√1 0 √2
3 6
189
Diagonalization of Symmetric Matrices
Example 24.5. Let A and B be n × n matrices. Show that if A is symmetric then the
matrix C = BABT is also a symmetric matrix.
190
Lecture 25
In this lecture, we will see how linear algebra is used in Google’s webpage ranking algorithm
used in everyday Google searches.
“In this paper, we present Google, a prototype of a large-scale search engine which makes
heavy use of the structure present in hypertext. Google is designed to crawl and index the
Web efficiently and produce much more satisfying search results than existing systems. The
prototype with a full text and hyperlink database of at least 24 million pages is available at
http://google.stanford.edu/ .”
1
A.N. Langville and C.D. Meyer, Google’s PageRank and Beyond, Princeton University Press, 2006
2
J. Kleinberg, Authoritative sources in a hyperlinked environment, Journal of ACM, 46, 1999, 9th ACM-
SIAM Symposium on Discrete Algorithms
3
S. Brin and L. Page, The anatomy of a large-scale hypertextual Web search engine, Computer Networks
and ISDN Systems, 33:107-117, 1998
The PageRank Algortihm
In both models, the web is defined as a directed graph, where the nodes represent
webpages and the directed arcs represent hyperlinks, see Figure 25.1.
2 3
where
(1) Nj is the number of outlinks from page j
From the previous example, we see that the PageRank of each page can be found by
solving an eigenvalue/eigenvector problem. However, when dealing with large networks such
as the internet, the size of the problem is in the billions (8.1 billion in 2006) and directly
solving the equations is not possible. Instead, an iterative method called the power method
is used. One starts with an initial guess, say x0 = ( 14 , 41 , 41 , 41 ). Then one updates the guess
by computing
x1 = Hx0 .
In other words, we have a discrete dynamical system
xk+1 = Hxk .
A natural question is under what conditions will the the limiting value of the sequence
192
Lecture 25
1
0
0 0 0
3
4 5
1
0 0 2
0 0
1
2 3 H = 0 3 0 0 0
1 1
0 3 2 0 1
1 0 0 0 1 0
Figure 25.2: Cycles present in the network
Now consider the network displayed in Figure 25.3. If we remove the cycle we are still
left with a dangling node, namely node 1 (e.g. pdf file, image file). Starting with x0 =
( 15 , . . . , 15 ) results in
lim xk = 0.
k→∞
To avoid the presence of dangling nodes and cycles, Brin and Page used the notion of
a random surfer to adjust H. To deal with a dangling node, Brin and Page replaced
the associated zero-column with the vector n1 1 = ( n1 , n1 , . . . , n1 ). The justification for this
adjustment is that if a random surfer reaches a dangling node, the surfer will “teleport” to
any page in the web with equal probability. The new updated hyperlink matrix H∗ may still
not have the desired properties. To deal with cycles, a surfer may abandon the hyperlink
structure of the web by ocassionally moving to a random page by typing its address in the
193
The PageRank Algortihm
browser. With these adjustments, a random surfer now spends only a proportion of his
time using the hyperlink structure of the web to visit pages. Hence, let 0 < α < 1 be
the proportion of time the random surfer uses the hyperlink structure. Then the transition
matrix is
G = αH∗ + (1 − α) n1 J.
The matrix G goes by the name of the Google matrix, and it is reported that Google uses
α = 0.85 (here J is the all ones matrix). The Google matrix G is now a primitive and
stochastic matrix. Stochastic means that all its columns are probability vectors, i.e., non-
negative vectors whose components sum to 1. Primitive means that there exists k ≥ 1 such
that Gk has all positive entries (k = 1 in our case). With these definitions, we now have the
following theorem.
(iv) The vector q is the unique probability vector which is an eigenvector of G with
eigenvalue λ1 = 1.
Proof. We will prove a special case4 . Assume for simplicity that G is positive (this is the
case of the Google Matrix). If x = Gx, and x has mixed signs, then
Xn X n
|xi | = G x
ij j < Gij |xj |.
j=1 j=1
Then
n
X n X
X n X
|xi | < Gij |xj | = |xj |
i=1 i=1 j=1 j=1
which is a contradiction. Therefore, all the eigenvectors in the λ1 = 1 eigenspace are either
negative or positive. One then shows that the eigenspace corresponding to λ1 = 1 is 1-
dimensional. This proves that there is a unique probability vector q such that
q = Gq.
4
K. Bryan, T. Leise, The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google, SIAM Review,
48(3), 569-581
194
Lecture 25
Let q0 be a probability vector and let q be as above, and let v2 , . . . , vn be the remaining
eigenvectors of G. Then q0 = q + c2 v2 + · · · + cn vn and therefore
Gk q0 = Gk (q + c2 v2 + · · · + cn vn )
= Gk q + c2 Gk v2 + · · · + cn Gk vn
= q + c2 λk2 v2 + · · · + cn λkn vn .
G = αH∗ + (1 − α) n1 eeT
= αH + (αa + (1 − α)1) n1 1T
and H is very sparse and requires minimal storage. A vector-matrix multiplication generally
requires O(n2) computation (n ≈ 8, 000, 000, 000 in 2006). Estimates show that the average
webpage has about 10 outlinks, so H has about 10n non-zero entries. This means that
multiplication with H reduces to O(n) computation. Aside from being very simple, the
power method is a matrix-free method, i.e., no manipulation of the matrix H is done. Brin
and Page, and others, have confirmed that only 50-100 iterations are needed for a satisfactory
approximation of the PageRank vector q for the web.
195
The PageRank Algortihm
196
Lecture 26
Lecture 26
xk+1 = Axk .
The vectors xk are called the state of the dynamical system and x0 is the initial condition
of the system. Once the initial condition x0 is fixed, the remaining state vectors x1 , x2 , . . . ,
can be found by iterating the equation xk+1 = Axk .
For simplicity, we assume that c + s = 1, i.e., c and s are population percentages of the total
population. Suppose that in the year 1900, the city population was c0 and the suburban
population was s0 . Suppose it is known that after each year 5% of the city’s population
197
Discrete Dynamical Systems
moves to the suburbs and that 3% of the suburban population moves to the city. Hence, the
population in the city in year 1901 is
c1 = 0.95c0 + 0.03s0 ,
while the population in the suburbs in year 1901 is
s1 = 0.05c0 + 0.97s0 .
The equations
c1 = 0.95c0 + 0.03s0
s1 = 0.05c0 + 0.97s0
can be written in matrix form as
" # " #" #
c1 0.95 0.03 c0
= .
s1 0.05 0.97 s0
Performing the same analysis for the next year, the population in 1902 is
" # " #" #
c2 0.95 0.03 c1
= .
s2 0.05 0.97 s1
Hence, the population movement is a linear dynamical system with matrix and state vector
" # " #
0.95 0.03 ck
A= , xk = .
0.05 0.97 sk
Suppose that the initial population state vector is
" #
0.70
x0 = .
0.30
Then, " #" # " #
0.95 0.03 0.70 0.674
x1 = Ax0 = = .
0.05 0.97 0.30 0.326
Then, " #" # " #
0.95 0.03 0.674 0.650
x2 = Ax1 = = .
0.05 0.97 0.326 0.349
In a similar fashion, one can compute that up to 3 decimal places:
" # " #
0.375 0.375
x500 = , x1000 = .
0.625 0.625
It seems as though the population distribution converges to a steady state or equilibrium.
We predict that in the year 2400, 38% of the total population will live in the city and 62%
in the suburbs.
Our computations in the population model indicate that the population distribution is
reaching a sort of steady state or equilibrium, which we now define.
198
Lecture 26
q ∈ Null(A − I).
Example 26.3. Find the equilibrium states of the matrix from the population model
" #
0.95 0.03
A= .
0.05 0.97
Does the initial condition of the population x0 change the long term behavior of the
discrete dynamical system? We will know the answer once we perform an eigenvalue analysis
on A (Lecture 22). As a preview, we will use the fact that
xk = Ak x0
and then write x0 in an appropriate basis that reveals how A acts on x0 . To see how the
last equation was obtained, notice that
x1 = Ax0
and therefore
x2 = Ax1 = A(Ax0 ) = A2 x0
and therefore
x3 = Ax2 = A(A2x0 ) = A3 x0
etc.
199
Discrete Dynamical Systems
Definition 26.4: Consider the discrete dynamical system xk+1 = Axk where A ∈ Rn×n .
The origin 0 ∈ Rn is said to be asymptotically stable if for any initial condition x0 ∈ Rn
of the dynamical system we have
lim xk = lim Ak x0 = 0.
k→∞ k→∞
The following theorem characterizes when a discrete linear dynamical system is asymptoti-
cally stable.
x0 = c1 v1 + · · · + cn vn .
Ak vi = λki vi
Then
xk = Ak x0
= Ak (c1 v1 + · · · + cn vn )
= c1 Ak v1 + · · · + cn Ak vn
= c1 λk1 v1 + · · · + cn λkn vn .
= c1 lim λk1 v1 + · · · + cn lim λkn vn
k→∞ k→∞
= 0v1 + · · · + 0vn
= 0.
200
Lecture 26
201