0% found this document useful (0 votes)

239 views217 pages

Real-Life Applications of Matrices

The document provides an overview of mathematical foundations for data science, including matrices, Gauss elimination, and consistency of linear systems. It defines matrices and vectors, describes how to perform operations like addition and multiplication of matrices, and introduces concepts such as row echelon form. Gauss elimination and row operations are explained as methods to solve systems of linear equations. The document establishes that row-equivalent systems using these methods have the same set of solutions.

Uploaded by

Aditya K R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

239 views217 pages

Real-Life Applications of Matrices

Uploaded by

Aditya K R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Mathematical Foundations for Data

Science (SSCSH ZC416)

Lecture-1

BITS Pilani Dr. Pritee Agarwal

(pritee.a@[Link])
Pilani Campus
Agenda

• Matrices

• Gauss Elimination

• Consistency of Linear Systems

BITS Pilani, Pilani Campus

Matrices
A matrix is a rectangular array of numbers or functions
which we will enclose in brackets. For example,
 a11 a12 a13 
 0.3 1 5  a ,
 0 0.2 16  ,  21 22a a 23 
   a31 a32 a33 
(1)
 e  x 2x 2  4
 6x ,  a1 a2 a3  , 1
e 4x   
2

The numbers (or functions) are called entries or, less

commonly, elements of the matrix.
The first matrix in (1) has two rows, which are the
horizontal lines of entries.
BITS Pilani, Pilani Campus
Matrix – Notations

We shall denote matrices by capital boldface letters A, B,

C, … , or by writing the general entry in brackets; thus A =
[ajk], and so on.
By an m × n matrix (read m by n matrix) we mean a matrix
with m rows and n columns—rows always come first! m × n
is called the size of the matrix. Thus an m × n matrix is of the
form

(2)

BITS Pilani, Pilani Campus

Vectors

A vector is a matrix with only one row or column. Its entries

are called the components of the vector.
We shall denote vectors by lowercase boldface letters a, b, …
or by its general component in brackets, a = [aj], and so on.
Our special vectors in (1) suggest that a (general) row vector
is of the form

A column vector

BITS Pilani, Pilani Campus

Equality of Matrices

Two matrices A = [ajk] and B = [bjk] are equal, written A = B, if

and only if (1) they have the same size and (2) the
corresponding entries are equal, that is, a11 = b11, a12 = b12, and
so on.

Matrices that are not equal are called different. Thus, matrices
of different sizes are always different.

BITS Pilani, Pilani Campus

Algebra of Matrices
Addition of Matrices
The sum of two matrices A = [ajk] and B = [bjk] of the same size is written A + B and
has the entries ajk + bjk obtained by adding the corresponding entries of A and B.
Matrices of different sizes cannot be added.
Scalar Multiplication (Multiplication by a Number)
The product of any m × n matrix A = [ajk] and any scalar c (number c) is written cA
and is the m × n matrix cA = [cajk] obtained by multiplying each entry of A by c.

(a) AB  BA (a) c( A  B)  cA  cB

(b) (A  B)  C  A  (B  C) (written A  B  C) (b) ( c  k) A  c A  k A
(c) A0  A (c) c( kA)  ( ck) A (written ckA)
(d) A  (  A )  0. (d) 1 A  A.

Here 0 denotes the zero matrix (of size m × n), that is, the m × n matrix with all entries
zero.

BITS Pilani, Pilani Campus

Matrix Multiplication
Multiplication of a Matrix by a Matrix
The product C = AB (in this order) of an m × n matrix A = [ajk]
times an r × p matrix B = [bjk] is defined if and only if r = n and
is then the m × p matrix C = [cjk] with entries

(1)

The condition r = n means that the second factor, B, must have

as many rows as the first factor has columns, namely n. A
diagram of sizes that shows when matrix multiplication is
possible is as follows:
A B = C
[m × n] [n × p] = [m × p].
BITS Pilani, Pilani Campus
Matrix Multiplication

EXAMPLE 1
Matrix Multiplication

 3 5 1  2 2 3 1   22 2 43 42 
AB   4 0 2   5 0 7 8    26 16 14 6 
 6 3 2   9 4 1 1  9 4 37 28 

Here c11 = 3 · 2 + 5 · 5 + (−1) · 9 = 22, and so on. The entry in

the box is c23 = 4 · 3 + 0 · 7 + 2 · 1 = 14. The product BA is
not defined.

BITS Pilani, Pilani Campus

Matrix Multiplication
Matrix Multiplication Is Not Commutative,
AB ≠ BA in General
This is illustrated by Example 1, where one of the two
products is not even defined. But it also holds for square
matrices. For instance,
 1 1   1 1   0 0 
100 100   1 1   0 0 
    
 1 1   1 1   99 99 
but       .
 1 1 100 100   99 99 
It is interesting that this also shows that AB = 0 does not
necessarily imply BA = 0 or A = 0 or B = 0.

BITS Pilani, Pilani Campus

Transposition of Matrices &
Vectors
Transposition of Matrices and Vectors
The transpose of an m × n matrix A = [ajk] is the n × m matrix
AT (read A transpose) that has the first row of A as its first
column, the second row of A as its second column, and so on.
Thus the transpose of A in (2) is AT = [akj], written out

(9)

As a special case, transposition converts row vectors to

column vectors and conversely.

BITS Pilani, Pilani Campus

Transposition of Matrices

Rules for transposition are

(a) ( A T )T  A
(10)
(b) (A  B) T  A T  B T
(c) ( cA ) T  cA T
(d) ( AB) T  BT A T .
CAUTION! Note that in (10d) the transposed matrices are in reversed order.

BITS Pilani, Pilani Campus

Special Matrices

• Symmetric: aij = aji

• Skew Symmetric : aij = - aji

• Triangular: Upper Triangular  aij = 0 for all i > j

Lower Triangular  aij = 0 for all i < j

• Diagonal Matrix: aij = 0 for all i ≠ j

• Sparse Matrix: Many zeroes and few non-zero entities

BITS Pilani, Pilani Campus

Linear Systems

A linear system of m equations in n unknowns x1, … , xn is

a set of equations of the form

(1)

The system is called linear because each variable xj appears

in the first power only, just as in the equation of a straight
line. a11, … , amn are given numbers, called the coefficients
of the system. b1, … , bm on the right are also given numbers.
If all the bj are zero, then (1) is called a homogeneous
system. If at least one bj is not zero, then (1) is called a
nonhomogeneous system.
BITS Pilani, Pilani Campus
Elementary Row Operations

Elementary Operations for Equations:

Interchange of two equations
Addition of a constant multiple of one equation to another
equation
Multiplication of an equation by a nonzero constant c

Clearly, the interchange of two equations does not alter the

solution set. Neither does their addition because we can
undo it by a corresponding subtraction. Similarly for their
multiplication, which we can undo by multiplying the new
equation by 1/c (since c ≠ 0), producing the original equation.

BITS Pilani, Pilani Campus

Elementary Row Operations

We now call a linear system S1 row-equivalent to a linear system S2 if S1 can be

obtained from S2 by (finitely many!) row operations. This justifies Gauss
elimination and establishes the following result.

Theorem 1
Row-Equivalent Systems

Row-equivalent linear systems have the same set of solutions.

Because of this theorem, systems having the same solution sets are often called
equivalent systems. But note well that we are dealing with row operations. No
column operations on the augmented matrix are permitted in this context
because they would generally alter the solution set.

BITS Pilani, Pilani Campus

Row Echelon Form(REF)
At the end of the ERO the form of the coefficient matrix, the
augmented matrix, and the system itself are called the row
echelon form.
In it, rows of zeros, if present, are the last rows, and, in each
nonzero row, the leftmost nonzero entry is farther to the right
than in the previous row. For instance, in Example 4 the
coefficient matrix and its augmented in row echelon form are
3 2 1  3 2 1 3 
 1 1   
0   1 1
(8) and 0  2  .
 3 3  3 3 
0 0 0   0 0 0 12 
   
Note that we do not require that the leftmost nonzero entries be 1
since this would have no theoretic or numeric advantage.

BITS Pilani, Pilani Campus

Row Echelon Form and
Information from it

The original system of m equations in n unknowns has

augmented matrix [A | b]. This is to be row reduced to
matrix [R | f].
The two systems Ax = b and Rx = f are equivalent: if either
one has a solution, so does the other, and the solutions are
identical.

BITS Pilani, Pilani Campus

Linear Systems – Matrix form

Matrix Form of the Linear System (1).

From the definition of matrix multiplication we see that the
m equations of (1) may be written as a single vector
equation
(2) Ax = b
where the coefficient matrix A = [ajk] is the m × n matrix

are column vectors.

BITS Pilani, Pilani Campus
Matrix Form of Linear System
Matrix Form of the Linear System (1). (continued)
We assume that the coefficients ajk are not all zero, so that A is not a zero matrix.
Note that x has n components, whereas b has m components. The matrix

is called the augmented matrix of the system (1). The dashed vertical line could
be omitted, as we shall do later. It is merely a reminder that the last column of
Ã did not come from matrix A but came from vector b. Thus, we augmented the
matrix A.

BITS Pilani, Pilani Campus

Gauss Elimination and Back
Substitution

Triangular form:
Triangular means that all the nonzero entries of the
corresponding coefficient matrix lie above the diagonal and
form an upside-down 90° triangle. Then we can solve the
system by back substitution.
Since a linear system is completely determined by its
augmented matrix, Gauss elimination can be done by
merely considering the matrices.
(We do this again in the next example, emphasizing the matrices by writing
them first and the equations behind them, just as a help in order not to lose
track.)

BITS Pilani, Pilani Campus

Gauss Elimination

Solve the linear system

x1  x2  x3  0
 x1  x2  x3  0
10 x2  25 x3  90
20 x1  10 x2  80.

Solution by Gauss Elimination.

This system could be solved rather quickly by noticing its
particular form. But this is not the point. The point is that the
Gauss elimination is systematic and will work in general,
also for large systems. We apply it to our system and then do
back substitution.

BITS Pilani, Pilani Campus

Example – Gauss Elimination

Solution by Gauss Elimination. (continued)

As indicated, let us write the augmented matrix of the system first and then the
system itself:
Augmented Matrix Ã Equations

Pivot 1 Pivot 1 x1  x2  x3  0
 1 1 1 0
 1 1 1 0   x1  x2  x3  0
  Eliminate
Eliminate  0 10 25 90  10 x2  25 x3  90
 
 20 10 0 80  20 x1  10 x2  80.

BITS Pilani, Pilani Campus

Gauss Elimination

Solution by Gauss Elimination. (continued)

Step 1. Elimination of x1
Call the first row of A the pivot row and the first equation
the pivot equation. Call the coefficient 1 of its x1-term the
pivot in this step. Use this equation to eliminate x1
(get rid of x1) in the other equations. For this, do:
Add 1 times the pivot equation to the second equation.
Add −20 times the pivot equation to the fourth equation.
This corresponds to row operations on the augmented
matrix as indicated in BLUE behind the new matrix in (3). So
the operations are performed on the preceding matrix.

BITS Pilani, Pilani Campus

Gauss Elimination

Solution by Gauss Elimination. (continued)

Step 1. Elimination of x1 (continued)
The result is

(3)  1 1 1 0 x1  x2  x3  0
0 0 0 0  Row 2  Row 1 0 0
 
 0 10 25 90  10 x2  25 x3  90
 
 0 30 20 80  Row 4  20 Row 1 30 x2  20 x3  80.

BITS Pilani, Pilani Campus

Gauss Elimination

Solution by Gauss Elimination. (continued)

Step 2. Elimination of x2
The first equation remains as it is. We want the new second equation to serve as
the next pivot equation. But since it has no x2-term (in fact, it is 0 = 0), we must
first change the order of the equations and the corresponding rows of the new
matrix. We put 0 = 0 at the end and move the third equation and the fourth
equation one place up. This is called partial pivoting (as opposed to the rarely
used total pivoting, in which the order of the unknowns is also changed).

BITS Pilani, Pilani Campus

Gauss Elimination

Solution by Gauss Elimination. (continued)

Step 2. Elimination of x2 (continued)
It gives

 1 1 1 0 x1  x2  x3  0
 0 10 25 90 
Pivot 10 10 x2  25 x3  90
  Pivot 10
Eliminate 30  0 30 20 80  30 x2  20 x3  80
 
 0 0 0 0  Eliminate 30x2 0  0.

BITS Pilani, Pilani Campus

Gauss Elimination

Solution by Gauss Elimination. (continued)

Step 2. Elimination of x2 (continued)
To eliminate x2, do:
Add −3 times the pivot equation to the third equation.
The result is
 1 1 1 0 x1  x2  x3  0
 0 10 25 90 
(4)   10 x2  25 x3  90
 0 0 95 190  Row 3  3 Row 2 95 x3  190
 
 0 0 0 0  0 0.

BITS Pilani, Pilani Campus

Gauss Elimination

Solution by Gauss Elimination. (continued)

Back Substitution. Determination of x3, x2, x1 (in this order)
Working backward from the last to the first equation of this
“triangular” system (4), we can now readily find x3, then x2,
and then x1:
95 x3  190
10 x2  25 x3  90
x1  x2  x3  0.
This is the answer to our problem. The solution is unique.

BITS Pilani, Pilani Campus

Gauss Elimination
At the end of the Gauss elimination (before the back substitution), the row
echelon form of the augmented matrix will be

• in upper triangular form

• having the first r rows non-zero

• exactly n-r rows would be zero rows

• the rhs would also have the last n-r rows zero

• any one of the n-r last rows is non-zero would imply inconsistency

• complexity is O(n3), where n is the number of rows

• facilitates the back substitution

BITS Pilani, Pilani Campus

Solution to System of Linear
Equations

A linear system is called

• overdetermined if it has more equations than unknowns

• determined if m = n, and

• underdetermined if it has fewer equations than unknowns.

Furthermore, a system is called

• consistent if it has at least one solution (thus, one solution or infinitely many
solutions),

• inconsistent if it has no solutions at all, as in

x1 + x2 = 1,
x1 + x2 = 0

BITS Pilani, Pilani Campus

Solution

The number of nonzero rows, r, in the row-reduced

coefficient matrix R is called the rank of R and also the
rank of A. Here is the method for determining whether
Ax = b has solutions and what they are:

(a) No solution. If r is less than m (meaning that R actually

has at least one row of all 0s) and at least one of the numbers
fr+1, fr+2, … , fm is not zero, then the system Rx = f is
inconsistent: No solution is possible. Therefore the system
Ax = b is inconsistent as well.

BITS Pilani, Pilani Campus

Solution

If the system is consistent (either r = m, or r < m and all the

numbers fr+1, fr+2, … , fm are zero), then there are solutions.

(b) Unique solution. If the system is consistent and r = n,

there is exactly one solution, which can be found by back
substitution.

(c) Infinitely many solutions. To obtain any of these

solutions, choose values of xr+1, … , xn arbitrarily. Then solve
the rth equation for xr (in terms of those arbitrary values),
then the (r − 1)st equation for xr−1, and so on up the line.

BITS Pilani, Pilani Campus

Mathematical Foundations for Data
Science (SSCSH ZC416)
Lecture-2

BITS Pilani Dr. Pritee Agarwal

(pritee.a@[Link])
Pilani Campus
Agenda

• Fields

• Vector spaces and subspaces

• Linear independence and dependence

• Basis and dimensions

• Linear transformations

• Affine Spaces
BITS Pilani, Pilani Campus
Field – Definition and
Examples
Group : (G, ‘*’) is a group if
i.* is closed
ii.* is associative
iii.* has an identity
iv.* has an inverse
Eg: < R , + > , < R, x >

< G, * > is Abelian if a * b = b * a " a,bÎ G

Eg: < R, + > , < R, *> are Abelian

<F,+, .> is a Field if <F, +> and <F, .> are Abelian
Eg: < R, +, .> , < C, +, .>, < Q, +, .>

BITS Pilani, Pilani Campus

Vector Space

Real Vector Space

A nonempty set V of elements a, b, … is called a real vector

space (or real linear space), and these elements are called
vectors (regardless of their nature, which will come out from
the context or will be left arbitrary) if, in V, there are defined
two algebraic operations (called vector addition and scalar
multiplication) as follows.

I. Vector addition associates with every pair of vectors a and b

of V a unique vector of V, called the sum of a and b and
denoted by a + b, such that the following axioms are satisfied.

BITS Pilani, Pilani Campus

Vector Space

Real Vector Space (continued 1)

I.1 Commutativity. For any two vectors a and b of V,

a + b = b + a.

I.2 Associativity. For any three vectors a, b, c of V,

(a + b) + c = a + (b + c) (written a + b + c).

I.3 There is a unique vector in V, called the zero vector and

denoted by 0, such that for every a in V,
a + 0 = a.

I.4 For every a in V, there is a unique vector in V that is

denoted by −a and is such that
a + (−a) = 0.
BITS Pilani, Pilani Campus
Vector Space

Real Vector Space (continued 2)

II. Scalar multiplication. The real numbers are called scalars. Scalar
multiplication associates with every a in V and every scalar c a unique
vector of V, called the product of c and a and denoted by ca (or ac) such
that the following axioms are satisfied.
II.1 Distributivity. For every scalar c and vectors a and b in V,
c(a + b) = ca + cb.
II.2 Distributivity. For all scalars c and k and every a in V,
(c + k)a = ca + ka.
II.3 Associativity. For all scalars c and k and every a in V,
c(ka) = (ck)a (written cka).
II.4 For every a in V,
1a = a.

BITS Pilani, Pilani Campus

Subspace

By a subspace of a vector space V we mean

“a nonempty subset of V (including V itself) that forms a

vector space with respect to the two algebraic operations
(addition and scalar multiplication) defined for the vectors
of V.”
• Space (W , , ) : within a vector space
• W ≠ Φ and W Í (V, +,.) over F is a subspace if
• 0 Î W, α w1 +w2 Î W
• Ex: V = {(x1, x2) | x1 , x2 Î R} over R, W = {(x1 , 0)|x1Î R}
• Set of singular matrices is not a subspace of M2x2

BITS Pilani, Pilani Campus

Linear Dependence and
Independence of Vectors
Given any set of m vectors a(1), … , a(m) (with the same
number of components), a linear combination of these
vectors is an expression of the form
c1a(1) + c2 a(2) + … + cma(m)
where c1, c2, … , cm are any scalars.
Now consider the equation
(1) c1a(1) + c2 a(2) + … + cma(m) = 0
Clearly, this vector equation (1) holds if we choose all cj’s zero,
because then it becomes 0 = 0.
If this is the only m-tuple of scalars for which (1) holds,
then our vectors a(1) … , a(m) are said to form a linearly
independent set or, more briefly, we call them linearly
independent.
BITS Pilani, Pilani Campus
Linear Dependence and
Independence of Vectors
Otherwise, if (1) also holds with scalars not all zero, we
call these vectors linearly dependent.

This means that we can express at least one of the vectors

as a linear combination of the other vectors. For instance,
if (1) holds with, say, c1 ≠ 0, we can solve (1) for a(1):

a(1) = k2 a(2) + … + kma(m) where kj = −cj /c1.

The rank of a matrix A is the maximum number of linearly
independent row vectors of A.

It is denoted by rank A.

BITS Pilani, Pilani Campus

Linear Dependence and
Independence of Vectors
Linear Independence and Dependence of Vectors

Consider p vectors that each have n components. Then these

vectors are linearly independent if the matrix formed, with these
vectors as row vectors, has rank p.

However, these vectors are linearly dependent if that matrix has

rank less than p.
Linear Dependence of Vectors

Consider p vectors each having n components. If n < p, then these vectors are linearly
dependent.

BITS Pilani, Pilani Campus

Linear Dependence and
Independence

Let S = {v1 , v2 , .... vn} Î V

Elements of S are LI if
n

åa v = 0
i i  α i = 0 " i is the only solution
i=1
Elements of S are LD if
n

åa v = 0
i i has atleast one non zero solution
i=1
Eg: V = Rn over R
LI and LD are related to rank

BITS Pilani, Pilani Campus

Linear Dependence and
Independence

BITS Pilani, Pilani Campus

Basis and Dimension

The maximum number of linearly independent vectors in V is

called the dimension of V and is denoted by dim V.

A linearly independent set in V consisting of a maximum

possible number of vectors in V is called a basis for V.
The number of vectors of a basis for V equals dim V.
The vector space Rn consisting of all vectors with n components
(n real numbers) has dimension n.
•R over R  One dimensional vector space
•C over C  One dimensional vector space
•R over Q  Infinite dimensional

BITS Pilani, Pilani Campus

Basis and Dimension

The set of all linear combinations of given vectors a(1), … ,

a(p) with the same number of components is called the
span of these vectors. Obviously, a span is a vector space.
If in addition, the given vectors a(1), … , a(p) are linearly
independent, then they form a basis for that vector space.
This then leads to another equivalent definition of basis.
A set of vectors is a basis for a vector space V if (1) the
vectors in the set are linearly independent, and if (2) any
vector in V can be expressed as a linear combination of the
vectors in the set. If (2) holds, we also say that the set of
vectors spans the vector space V.

BITS Pilani, Pilani Campus

Row Space and Column
Space
○ If A is an mn matrix

■ the subspace of Rn spanned by the row vectors of A is called

the row space of A

■ the subspace of Rm spanned by the column vectors is called

the column space of A

The solution space of the homogeneous system of equation Ax = 0,

which is a subspace of Rn, is called the nullspace of A.

 a11 a12  a1n   a11   a12   a1n 

a a  a  a  a  a 
Amn =  21 22 2n 
c1 =  21 , c2 =  22 ,, cn =  2n 
            
   
a
 
a
 
am1 am2  amn   m1   m2  amn 

BITS Pilani, Pilani Campus

Basis for Row Space and
Column Space
If a matrix R is in row echelon form

– the row vectors with the leading 1’s (i.e., the nonzero
row vectors) form a basis for the row space of R

– the column vectors with the leading 1’s of the row

vectors form a basis for the column space of R

BITS Pilani, Pilani Campus

Basis for Row Space
 1 3 1 3
Find a basis of row space of A = 
 0 1 1 0
 3 0 6  1
 
 3 4 2 1
 2 0 4 2
Sol:
 1 3 1 3 1 3 1 3 w 1
 0 1 1 0 0 1 1 0 w 2
 
A=  3 0 6  1 .
REF B=
0 0 0 1 w 3
   
 3 4 2 1 0 0 0 0
 2 0 4 2 0 0 0 0
a1 a 2 a3 a4 b1 b 2 b3 b 4
a basis for RS(A) = {the nonzero row vectors of B}
= {w1, w2, w3} = {(1, 3, 1, 3) , (0, 1, 1, 0) ,(0, 0, 0, 1)}
BITS Pilani, Pilani Campus
Basis for Column Space

Find a basis for the column space  

1 2 1 2 0
of the matrix A.  
A= 2 4 1 1 0 
 3 6 1 4 1 
 
 0 0 1 5 0 
a1 a2 a3 a4 a5
Reduce A to the reduced row- echelon form
 1 2 0 3 0 
 
E = 0 0 1 5 0  = [e1 e2 e3 e4 e5 ]
 0 0 0 0 1 
 
 0 0 0 0 0 

e2 = 2e1  a2 = 2a1
e4 = 3e1 + 5e3  a4 = 3a1 + 5a3
{a1 , a3 , a5 } is a basis for column space of A
BITS Pilani, Pilani Campus
Solution Space/ Null Space
● Determine a basis and the dimension of the solution space of the
homogeneous system
2x1 + 2x2 – x3 + x5 = 0
-x1 + x2 + 2x3 – 3x4 + x5 = 0
x1 + x2 – 2x3 – x5 = 0
x3+ x4 + x5 = 0
○ The general solution of the given system is
x1 = -s-t, x2 = s,
x3 = -t, x4 = 0, x5 = t
○ Therefore, the solution vectors can be written as
  1   1
 1   0 
   
v 1 =  0  an d v 2 =   1 
   
 0  0 
 0   1 
BITS Pilani, Pilani Campus
Solution Space/ Null Space
Find the solution space of a homogeneous system Ax = 0.
1 2  2 1 
A = 3 6  5 4
1 2 0 3
1 2  2 1  1 2 0 3 
A = 3 6  5 4 REF
 0 0 1 1
1 2 0 3 0 0 0 0
x1 = –2s – 3t, x2 = s, x3 = –t, x4 = t
 x1   2 s  3t   2  3
x   s   1  0
 x =  2 =   = s    t   = sv 1  t v 2
 x3    t   0   1
         NS ( A) = {sv  tv | s, t Î R}
x
 4  t   0  1 1 2

BITS Pilani, Pilani Campus

Rank of a matrix

(Row and column space have equal dimensions)

n If A is an mn matrix, then the row space and the column
space of A have the same dimension.

The dimension of the row (or column) space of a

matrix A is called the rank of A.

BITS Pilani, Pilani Campus

Nullity of Matrix

§ The dimension of the nullspace of A is called the

nullity of A

§ Notes: rank(AT) = dim(RS(AT)) = dim(CS(A)) = rank(A)

Therefore rank(AT ) = rank(A)
n (Dimension of the solution space)
If A is an mn matrix of rank r, then the dimension
of the solution space of Ax = 0 is n – r. That is
=

BITS Pilani, Pilani Campus

Rank and Nullity of Matrix

The number of leading variables in the solution of Ax=0.

(The number of nonzero rows in the row-echelon form of A)
The number of free variables (non leading variables) in the
solution of Ax = 0.
If A is an mn matrix and rank(A) = r, then
Fundamental Space Dimension

CS(A)=RS(AT) r

NS(AT) m–r
BITS Pilani, Pilani Campus
Rank and Nullity of Matrix
● Find the rank and nullity of the matrix
 1 2 0 4 5 3
 3 7 2 0 1 4 
A= 
 2 5 2 4 6 1 
● Solution:  
 4  9 2  4  4 7 
○ The reduced row-echelon form of A is

1 0 4 28 37 13

0 1 2 12 16 5 

0 0 0 0 0 0
 
0 0 0 0 0 0

○ Since there are two nonzero rows, the row space and column
space are both two-dimensional, so rank(A) = 2.
BITS Pilani, Pilani Campus
Rank and Nullity of Matrix
○ The corresponding system of equations will be
x1 – 4x3 – 28x4 – 37x5 + 13x6 = 0
x2 – 2x3 – 12x4 – 16 x5+ 5 x6 = 0
○ It follows that the general solution of the system is
x1 = 4r + 28s + 37t – 13u, x2 = 2r + 12s + 16t – 5u,
x3 = r, x4 = s, x5 = t, x6 = u
 x1   4  28 37   13
or x   2 12  16   5 
 2        
 x3  1  0 0  0 
  = r s t  u 
x
 4  0 1
    0  0 
 x5  0 0 1  0 
          Thus, nullity(A) = 4
 x6   0   0   0   1 
BITS Pilani, Pilani Campus
Lecture 3

Math Foundations Team

Analytic geometry

▶ We have studied vector spaces in the previous lecture.

▶ Now we would like to provide some geometric interpretation
to these concepts.
▶ We shall take a close look at geometric vectors and the
concepts of lengths of vectors and angles between vectors.
▶ But first we need to add the concept of an inner product to
our vector space.
Norms

▶ A norm on a vector space is a function ∥.∥ : V → R, x → ∥x ∥

which assigns to each vector x a length ∥x ∥ such that for all
λ ∈ R and x , y ∈ V the following properties hold:
▶ Absolutely homogeneous: ∥λx ∥ = |λ|∥x ∥
▶ Triangle inequality: ∥x + y ≤ x + y
▶ Positive definite: ∥x ∥ ≥ 0 and ∥x ∥ = 0 =⇒ x =0
▶ Manhattan norm : |x | = i=n
P
i=1 |xi | where |.| represents
absolute value.
▶ Euclidean norm : ∥x ∥2 =
qP
i=n 2
i=1 xi .
Inner products

▶ Dot product in R n is given by x T y = i=1

Pi=n
xi yi
▶ A bilinear mapping Ω is a mapping with two arguments and is
linear in both arguments: Let V be a vector space such that
x , y , z ∈ V , and let λ, ψ ∈ R. Then we have
Ω(λx + ψ y , z ) = λΩ(x , z + ψΩ(y , z ), and
Ω(x , λy + ψ z ) = λΩ(x , y ) + ψΩ(x , z ).
▶ Let V be a vector space and Ω : V × V → R be a bilinear
mapping that takes two vectors as arguments and returns a
real number. Then Ω is called symmetric if
Ω(x , y ) = Ω(y , x ). Also Ω is called positive-definite if
∀x ∈ V {0}Ω(x , x ) > 0 and Ω(0, 0) = 0.
Inner products

▶ A positive-definite, symmetric bilinear mapping

Ω : V × V → R is called an inner product. To denote an inner
product on V we generally write ⟨x , y ⟩.
▶ The pair (V , ⟨., .⟩) is called an inner product space.
▶ Next we introduce the concept of symmetric, positive-definite
matrices and show we can express an inner product using such
matrices.
▶ We call that in a vector space V any vector x can be written
as linear combination of the basis vectors. We use this to
express an inner product in terms of a matrix.
Symmetric, positive-definite matrices

Theorem: For a real-valued, finite-dimensional vector space V and

an ordered basis B of V , it holds that ⟨., .⟩ : V × V → R is an
inner product if and only if there exists a symmetric, positive
definite matrix A ∈ R n×n with ⟨x , y ⟩ = x̂ T Aŷ .
Proof: One direction →: ⟨., .⟩ is an inner product =⇒ A is
symmetric, positive-definite such that ⟨x , y ⟩ = x̂ T Aŷ .
Other direction ←: A is symmetric, positive definite such that the
operation ⟨x , y ⟩ is defined as ⟨x , y ⟩ = x̂ T Aŷ =⇒ the operation
defined is an inner product.
Symmetric, positive-definite matrices

▶ We prove the → direction.

▶ Let ⟨x , y ⟩ be the inner product between the vectors x , y in V .
We can write x in terms of sayPn basis vectors as
x = i=1 ψi bi . Similarly y = i=1 λi bi .
Pi=n i=n

▶ Since the inner product is bilinear we can write ⟨x , y ⟩ =

ψi bi , i=1 λi bi ⟩ = i=1 j=1 ψi ⟨bi , bj ⟩λj = x̂ Aŷ
Pi=n Pi=n Pi=n Pj=n T
⟨ i=1
where Aij = ⟨bi , bj ⟩.
▶ Here x̂ , ŷ are vectors which represent the coordinates of the
original vectors x , y with respect to the basis vectors.
Symmetric, positive-definite matrices

▶ This means that the inner product is entirely determined

through the matrix A. The symmetry of the inner product
means that Aij = ⟨bi , bj ⟩ = Aji = ⟨bj , bi ⟩. Thus A is
symmetric.
▶ The positive-definiteness of the inner product means that
∀x ∈ V \{0}, x T Ax > 0.
Symmetric, positive-definite matrices

▶ Now let us consider an operation op such that x op y = x̂ T Aŷ

where A is a symmetric, positive definite matrix.
▶ We shall show that ”op” is an inner product by showing that
it has all the properties of an inner product:
▶ ”op” has symmetry because x op y = x̂ T Aŷ and
y opx = ŷ T Ax̂ = ŷ T (Ax̂ ). By a property of the dot product
we can write ŷ T (Ax̂ ) = (Ax̂ )T ŷ = x̂ T AT ŷ = x̂ T Aŷ where
the last equality in the chain is possible since A is symmetric.
▶ ”op” also has bilinearity since we see that for
r ∈ R, (r x )op y = (r x̂ )T Aŷ = r x̂ T Aŷ = r x op y .
▶ (x + y )op z = (x̂ + ŷ )T Aẑ = x̂ T Aẑ + ŷ T Aẑ = x op z + y op z .
▶ Finally if x is a non-zero vector then x̂ is also a non-zero
vector, x op x = x̂ T Ax̂ > 0 since we are given that A is
positive-definite.
Symmetric, positive definite matrices

▶ Can a symmetric, positive-definite matrix have less than full

rank? We have x T Ax > 0 for all non-zero x . Thus x = 0 is
the only vector allowed in the nullspace. The nullspace is
0-dimensional so A has full rank.
▶ What can be said about the diagonal elements of a
positive-definite matrix? From (ei )T Aei > 0 where ei is the
ith canonical basis vector, we see that Aii > 0. Thus the
diagonal entries are all strictly positive.
Lengths and distances

▶ Inner products and norms are closely related in the sense that
any inner product induces a norm: ∥x ∥ = ⟨x , x ⟩
p

▶ Not every norm is induced by an inner product, for example

the Manhattan norm.
▶ For an inner product vector space (V , ⟨., .⟩), the induced norm
∥.∥ satisfies the Cauchy-Schwarz inequality: ⟨x , y ⟩ ≤ ∥x ∥∥y ∥.
Why is this true?
Cauchy-Schwarz inequality

▶ Let u and v be two vectors and let us consider the length of

the vector u − αv where α is a constant.
▶ The length of the vector u − αv is greater than or equal to
zero. The length of the vector u − αv is
∥u − αv ∥ = ⟨u − αv , u − αv ⟩ = (u − αv )T (u − αv ).
▶ We can expand the dot product
(u − αv )T (u − αv ) = u T u − αu T v − αv T u + α2 v T v ≥ 0
▶ Now set α = u T v to get u T u − (u Tv ) ≥ 0 which leads us to
T T 2

v v v v
(u T u )(v T v ) ≥ (u T v )2 which is Cauchy-Schwarz inequality.
Metric space

▶ Consider an inner product space (V , ⟨., .⟩). Define d(x , y ) the

distance between two p vectors x and y to be
d(x , y ) = ∥x − y ∥ = ⟨x − y , x − y ⟩.
▶ If we use the dot product as the inner product, then the
distance is called the Euclidean distance.
▶ The mapping d : V × V → R is called a metric.
Properties of a metric space

A metric d has the following properties:

▶ d is positive-definite which means d(x , y ) ≥ 0∀x , y ∈ V .
d(x , y ) = 0 =⇒ x = y .
▶ d is symmetric which means d(x , y ) = d(y , x )∀x , y ∈ V .
▶ d obeys the triangle inequality as follows:
d(x , z ) ≤ d(x , y ) + d(y , z )∀x , y , z ∈ V
Inner products and metrics seem to be very similar in terms of their
properties - however there is one important difference. When x and
y are close to each other the inner product is large but the distance
metric is small. On the other hand when x and y are far apart,
then the inner product is small but the distance metric is large.
Angles and orthogonality

▶ In addition to being able to capture the lengths of vectors and

the distance between vectors, inner products can also capture
the angle ω between two vectors and can thus capture the
geometry of a vector space.
▶ The key to using the inner product to characterize the angle
between two vectors is the Cauchy-Schwarz inequality.
▶ Assume that x and y are not the 0 vector. Then the
Cauchy-Schwarz inequality tells us that

−1 ≤
xT y ≤ 1
∥x ∥∥y ∥
(1)
Angles and orthogonality

▶ Since the Cauchy-Schwarz ratio lies between -1 and 1 we can

set it equal to the cosine of a unique angle ω ∈ [0, π] such that

⟨x , y ⟩
∥x ∥∥y ∥
cos(ω) = (2)

▶ The angle ω is the angle between two vectors. What does it

capture?
▶ The notion of angle captures similarity of orientation between
two vectors. When the dot product is close to zero, the
vectors are more or less pointing in the same direction and
ω ≈ 0.
Angles and orthogonality

▶ Food for thought: Suppose we choose vectors x and y

uniformly at random in high dimensions. What happens to the
dot product between the vectors and hence the angle between
them?
▶ To choose a vector uniformly at random over a sphere let
every component in the vector be an independent Gaussian
random variable of mean 0 and unit variance.
▶ Write a small program to see what happens ...
Angles and orthogonality

▶ A key feature of the inner product is that we can use it to

characterize vectors that are orthogonal.
▶ Two vectors x and y are orthogonal if and only if the inner
product between them is 0. For an orthogonal pair of vectors
x , y we can write x ⊥ y .
▶ By the above definition the 0-vector is orthogonal to all
vectors.
▶ Vectors which are othrogonal with respect to one inner
product need not be orthogonal with respect to another inner
product.
Example - angles and orthogonality

▶ Consider the vectors x = [1, 1]T and y = [−1, 1]T

▶ With respect to the inner product defined as a dot product we
see that ⟨x , y ⟩ = x T y = 1 ∗ −1 + 1 ∗ 1 = 0.

▶ With respect to the inner product x y , the angle
T 2 0
0 1
between the two vectors x and y becomes

⟨x , y ⟩
∥x ∥∥y ∥
cos(ω) =
Example - angles and orthogonality

▶ Continuing with our example we have

x T Ay
cos(ω) =
x Ax y T Ay
T

2x1 y1 + x2 y2
=
(2x12 + x22 )(2y12 + y22 )
−1
=
3

where A =
2 0
.
0 1
▶ Thus, with respect to the new definition of inner product the
vectors x and y are no longer orthogonal.
Orthonormal matrix

▶ A square matrix A ∈ R n×n is an orthogonal matrix if and only

if its columns are orthonormal:

AT A = I = AAT
AT = A−1
▶ If the columns of a matrix are orthonormal, why are its rows
orthonormal too? This follows from the fact that the
left-inverse of a square matrix is the same as the right-inverse.
Let A be a square matrix with B and C the left and right
inverses of A: BA = I = AC =⇒ B = C . Why is this true?
Orthonormal matrix

▶ Transformations by an orthonormal matrix preserve lengths.

This can be seen as follows, using the dot product as the
definition of the inner product:
∥Ax ∥2 = Ax T Ax = x T AT Ax = x T I x = x T x .
▶ An example of an orthonormal matrix
is the 2D-rotation

cos θ −sinθ
matrix which can be expressed as where θ is
sinθ cosθ
the angle of rotation.
Orthonormal matrix

▶ Also the angle between two vectors x and y does not change
after transformation by an orthonormal matrix. This can be
seen as follows:
Ax T Ay
∥Ax ∥ ∥Ay ∥
cos(ω) =

x T AT Ay
∥x ∥ ∥y ∥
=

xT y
∥x ∥ ∥y ∥
=
Orthonormal basis

▶ We already looked at the concept of a basis of a vector space,

and found that for the vector space R n we need n basis
vectors.
▶ Our basis vectors needed only to be linearly independent - we
can ensure linear independence by ensuring that our basis
vectors point in different directions, so that a linear
combination of n − 1 basis vectors cannot cancel out the nth
basis vector.
▶ Now we will look at a special case of a basis where the vectors
are all mutually orthogonal in the sense of the inner product,
and each vector is of unit length. We call such a basis an
orthonormal basis.
Orthonormal basis

▶ Question: Can you immediately think of an orthonormal basis

for R n ? Is an orthonormal basis for a vector space unique?
▶ Formal definition of an orthonormal basis: Consider an
n-dimensional vector space V and n basis vectors
{b1 , b2 , . . . bn }. If it is true that
∀i, j = 1, . . . n, i ̸= j ⟨bi , bj ⟩ = 0 and ⟨bi , bi ⟩ = 1, then the
basis is called an orthonormal basis.
▶ If the basis vectors are only mutually orthogonal but not of
length unity, then we have an orthogonal basis.
Gram-Schmidt process

▶ Given a set of basis vectors for a vector space, can we convert

the given basis into an orthogonal basis? Yes, we shall use
Gaussian elimination to construct such a basis.
▶ Let us start with an example: Consider R 2 and two basis
vectors v1 = (3, 1)T and v2 = (2, 2)T . Put these
vectors into
columns of a matrix A such that A =
3 2
.
1 2
▶ The next step is to perform Gaussian elimination
on the
following augmented matrix: [AT A|AT ] =
10 8| 3 1
8 8| 2 2
▶ On performing Gaussian
elimination of
this augmented matrix
1 0| 0.3 0.1
we end up with
0 1| −0.25 0.75
Gram-Schmidt process

▶ Note that after the completion of Gaussian elimination the

two rows on the right hand side are orthogonal. They form a
basis for R 2 . We can normalize the vectors to get an
orthonormal basis.
▶ What is the justification for this technique?
▶ First we see that when the m × n matrix A has full column
rank, then the matrix AT A is positive definite. To see this
note that any solution x to Ax = 0 is also a solution to
AT Ax = 0 and vice-versa. Why is this the case?
▶ When A is full rank, there are no non-trivial solutions to
Ax = 0. Thus the fact that there are no non-trivial solutions
to Ax = 0 means that ∀x ∈ R m , x ̸= 0, x T Ax > 0.
Elementary transformations

▶ One of the steps of Gaussian elimination is the the subtraction

of a multiple of a given row from a row below it. This step
can be achieved by pre-multiplication of the given matrix by
an elementary matrix. An elementary matrix is like an identity
matrix except that one of the entries below the diagonal is
allowed to be non-zero.
▶ To show how the process of elimination works using an
 
a11 a12 a13
elementary matrix consider the matrix A = a21 a22 a23 
a31 a32 a33
and assume that we want to subtract two times the first row
from the second row.
Elementary transformations

This 
can be accomplished
 by the following elementary matrix
1 0 0
E = −2 1 0 so that the product
0 0 1
  
1 0 0 a11 a12 a13
EA = −2 1 0 a21 a22 a23 
0 0 1 a31 a32 a33
 
a11 a12 a13
= a21 − 2a11 a22 − 2a12 a23 − 2a13 
a31 a32 a33
Product of elementary transformations

▶ A series of Gaussian elimination steps can be represented as a

product of elementary transformations acting on A:
Em Em−1 . . . E1 A.
▶ The product of lower triangular matrices can be seen to be
lower triangular, and the inverse of a lower triangular matrix
can also be seen as a lower triangular matrix.
▶ Thus the action of Gaussian elimination operations can be
seen in the following terms L−1 A = U where the product of
the elementary transformations is represented as the inverse of
a lower triangular matrix for notational convenience, and the
right hand side U is an upper triangular matrix. Thus we have
A = LU .
Final argument

▶ Returning to our problem we are performing Gaussian

elimination on the matrix AT A where A contains the basis
vectors as its columns. Upon Gaussian elimination on the
augmented matrix we reduce [AT A|AT ] to get [U |L−1 AT
where AT A = LU .
▶ Now we shall show that Q T = L−1 AT is an orthogonal matrix
whose rows are orthogonal.
▶ Consider Q T Q = L−1 AT A(L−1 )T = U (L−1 )T = some
upper triangular matrix
▶ But Q T Q is a symmetric matrix and can only be upper
triangular if it is diagonal. Therefore Q is an orthogonal
matrix whose columns are orthogonal. They can be
normalized to obtain an orthonormal basis.
Lecture 4

Math Foundations Team

Matrix decompositions

▶ We studied vectors and how to manipulate them in preceding

lectures.
▶ Mappings and transformations of vectors can be conveniently
described in terms of operations performed by matrices.
▶ In this lecture we shall study three aspects of matrices: how
to summarize matrices, how matrices can be decomposed, and
how the decompositions can be used for matrix
approximations.
Determinant and trace

A determinant of order n × n is a scalar associated

with a n × n
a11 a12 . . . a1n

a21 a22 . . . a2n
matrix and is denoted as follows: det(A) = .

. .. ..
. . .

an1 an2 . . . ann
We have a cofactor formula to calculate a determinant of order n:
for n = 1, we have det(A) = a11 . For n ≥ 2 we have

D = det(A) = aj1 Cj1 + aj2 Cj2 + . . . + ajn Cjn

= a1k C1k + a2k C2k + . . . + ank Cnk

The first line above represents expansion along the jth row, while
the second line represents expansion along the kth column.
Cofactor formula

▶ In the preceding slide, the Cjk = (−1)j+k Mjk where Mjk

represents the n − 1 order determinant of the submatrix of A
obtained by removing the jth row and kth column.
▶ Mjk is called the minor of ajk in D and Cjk is called the
cofactor of ajk in D.
▶ Our definition for the determinant in the previous slide shows
that the n × n determinant is defined in terms of
(n − 1) × (n − 1) determinants which in turn are defined in
terms of (n − 2) × (n − 2) determinants recursively.
▶ Let us examine the computation for a simple 3 × 3
determinant.
Example

a11 a12 a13

Let us compute D = a21 a22 a23
a31 a32 a33

a12 a13
For the entries in the second row the minors are M21= ,
a32 a33

a a a a
M22 = 11 13 , M23 = 11 12 . The cofactors are
a31 a33 a31 a32
C21 = (−1) M21 = −M21 , C22 = (−1)2+2 M22 = M22 ,
1+2

C23 = (−1)2+3 M23 = −M23 .

Expanding along the second row we can write
D = a21 C21 + a22 C22 + a23 C23 .
Behaviour of determinant

Theorem: We can state the following for a nth order determinant

under elementary row operations:(a) interchanging two rows
multiplies the value of the determinant by −1, and (b) adding a
multiple of a row to another row does not change the value of the
determinant.
Proof Sketch Let us look at how to prove (a). The proof
is by
a b
induction. The statement holds for n = 2 since = ad − bc
c d

c d
whereas = bc − ad = −(ad − bc). We make the induction
a b
hypothesis that the statement is true for all determinants of order
(n − 1). Let D represent the original determinant and E represent
the determinant with rows interchanged.
Proof sketch continued

Let us expand D and E along a row that is not interchanged.

We have
k=n
X
D = (−1)j+k ajk Mjk
k=1
k=n
X
E = ajk (−1)j+k Njk
k=1

where Njk in E is obtained by exchanging ttwo rows of the minor

Mjk in D. Mjk and Njk are determinants of order n − 1 where one
of the determinants has a pair of rows interchanged as compared
to the other determinant. Therefore Mjk = −Njk , and D = −E .
Proof sketch continued

Let us now look at adding multiples of a row to another row.

▶ Add c times row i to row j. Then we get a new determinant
D ′ whose entries in the jth row has ajk + caik . Expanding the
k=n
X
′
′
determinant D we have D = (ajk + caik )(−1)j+k Mjk =
k=1
k=n
X k=n
X
(−1)j+k ajk Mjk + (−1)j+k caik Mjk .
k=1 k=1
▶ The summation can be written as D ′ = D1 + cD2 where
D1 = D and D2 represents the determinant of a matrix similar
to the one we started out with except that rows j and i both
have coefficients aik in them. Thus two rows of D2 are equal -
this makes D2 = 0 → why is this?
Proof sketch continued

▶ In part (a) we showed that interchanging two rows will negate

the determinant. If we interchange rows i and j we will get
the same determinant since the two rows are identical. But
one of the determinants must be the negative of the other.
This is only possible when both determinants are zero. Thus
D2 = 0 and D ′ = D.
▶ Bottom line → Adding a multiple of one row to another row
does not change the determinant.
▶ This will lead us to our next result.
Another result

Theorem An n × n matrix A has rank n if and only if its

determinant is not equal to zero.
Proof sketch: A has full rank =⇒ det(A) ̸= 0: Gaussian
elimination reduces A to upper triangular matrix U = (Uij ) whose
determinant is the product of all the elements Uii . But we know
that det(A) = (−1)s det(U ) where s is the number of row
interchanges performed during Gaussian elimination. Since A has
full rank, the columns of A are linearly independent and the only
solution to Ax = 0 is x = 0. The system Ax = 0 has the same set
of solutions as Ux = 0, so this means U has only pivot columns.
The pivots are all the elements Uii . The product of all pivots is
non-zero, and hence det(U ) = det(A) ̸= 0.
Another result

det(A) ̸= 0 =⇒ A has full rank: If Q the determinant of A is

non-zero, det(A) = (−1)s det(U ) = i=n i=1 Uii means that all the
Uii are non-zero. Therefore all the columns of U are pivot columns
with the pivots being the Uii . The pivot columns are all linearly
independent, so the only solution to Ux = 0 is x = 0. This is also
the only solution to Ax = 0 which means A has full rank.
Trace

▶ The trace of a n × n square matrix A is defined as

tr (A) = i=n
P
i=1 aii , i.e the sum of the diagonal elements of A.
▶ The trace has the following properties:
▶ tr (A + B ) = tr (A) + tr (B ), for A, B ∈ Rn×n
▶ tr (αA) = αtr (A), α ∈ R
▶ tr (In ) = n
▶ tr (AB ) = tr (BA) for A ∈ Rn×k , B ∈ Rk×n .
▶ The proofs of these properties are not difficult.
Characteristic polynomial

▶ For λ ∈ R and A ∈ Rn×n we can define pA (λ) = det(A − λI )

and show that it can be written as
c0 + c1 λ + . . . cn−1 λn + (−1)n λn where c0 , c1 . . . cn−1 ∈ R.
▶ We can show that c0 = det(A) and cn−1 = (−1)n−1 tr (A)
▶ To see that c0 = det(A), set λ = 0 in det(A − λI ) to get
pA (0) = det(A) = c0
▶ The formula for cn−1 takes a little bit of work - let us expand a
a11 − λ a12 a13
3 × 3 determinant det(A − λI ) = a21

a22 − λ a23
a31 a32 a33 − λ
Characteristic polynomial

▶ Expanding the determinant along the first row we see that the
i=3
Y
(a11 − λ)C11 term contains the product (aii − λ) which
i=1
contains the powers λ3 and λ2 . The other contributors to the
determinant i.e a12 C12 and a12 C13 expand into terms where
the maximum power of λ = 1.
▶ Carrying this analogy over to the general case of n > 3 we see
that expanding along the first row the first contributor to the
i=n
Y
determinant will have the term (aii − λ) and subsequent
i=1
contributors will have a maximum power of λn−2 as the
minors for each such contributor will kill off a term containing
λ in a given row and column.
Characteristic polynomial

▶ Thus in the determinant expansion to obtain the characteristic

polynomial we see that coefficient to λn−1 can only come
i=n
Y
from the expansion of (aii − λ) and can be seen to be seen
i=1
i=n
aii = (−1)n−1 tr (A).
X
to be (−1)n−1
i=1
▶ As a corollary to this argument we can see that the coefficient
to λn in the characteristic polynomial is (−1)n
▶ We will use the characteristic polynomial to compute
eigenvalues and eigenvectors.
Eigenvalues and eigenvectors

▶ LetA ∈ Rn×n be a square matrix. Then λ ∈ R is an eigenvalue

of A and x ∈ Rn \ 0 is the corresponding eigenvector of λ if
Ax = λx . This equation is called the eigenvalue equation.
▶ The following statements are equivalent:
▶ λ is an eigenvalue of A ∈ Rn×n .
▶ There exists an x ∈ Rn \ 0 with Ax = λx , or equivalently,
(A − λIn )x = 0 can be solved non-trivially, i.e., x ̸= 0.
▶ rank(A − λIn ) < n.
▶ det(A − λIn ) = 0.
▶ If x is an eigenvector corresponding to a particular eigenvalue
λ, c x , c ∈ R \ 0 is also an eigenvector.
Eigenvalues and eigenvectors - example

▶ Consider the matrix A = 1 1 . The characteristic
1 1

polynomial det(A − λI ) = (1 − λ)2 − 1 and setting it to zero

gives us the roots of the characetristic polynomial:
(1 − λ)2 − 1 = 0 has roots λ = 2, 0.
▶ What are the eigenvectors? For λ = 0 we solve for Ax = 0x ,
so we find the nullspace of the matrix A. Using Gaussian

elimination we convert Ax = 0 to Ux = 0 where U =
1 1
.
0 0

1
Thus we discover the eigenvector for λ = 0.
−1

1
▶ Similarly we discover the eigenvector for λ = 2.
1
Eigenvalues and eigenvectors - example

▶ The general procedure to find eigenvalues and eigenvectors is

to first find the roots of the characteristic polynomials and
then find the nullspaces of the matrices A − λI for the
different roots λ.
▶ Does every n × n matrix have a full set of eigenvectors, i.e n
eigenvectors?

0 1
▶ Look at . What are its eigenvalues and eigenvectors?
0 0
▶ Point to ponder Looking at the equation Ax = λx it seems
that the action of A on x is to preserve the direction of x but
scale it up or down according to λ. Does this mean that a
rotation matrix has no eigenvalues and eigenvectors?
Some additional properties

▶ λ is an eigenvalue of A if and only if λ is a root of the

characteristic polynomial pA (λ) of A. This can be easily seen
as a consequence of the definition of pA (λ).
▶ For A ∈ Rn×n , the set of eigenvectors corresponding to an
eigenvalue λ spans a subspace of Rn called the Eigenspace of
A with respect to λ and is denoted by Eλ .
▶ The set of all eigenvalues of A is called the spectrum of A.
▶ Look at the eigenvalues and eigenspace of the n × n identity
matrix In . It has one eigenvalue λ = 1 and the eigenspace is
Rn . Every canonical vector is a basis vector for the eigenspace.
Some additional properties

▶ A matrix and its transpose have the same eigenvalues. To see

this, first note that det(A) = det(AT ). Then det(A − λI ) =
det((A − λI )T ) = det(AT − λI T ) = det(AT − λI ). The last
expression in the chain of equalities is the characteristic
polynomial for pAT (λ). Thus we have pA (λ) = pAT (λ) which
means the characteristic polynomials are equal and so the
roots of the polynomials or the eigenvalues must be equal.
▶ The eigenspace Eλ is the nullspace of A − λI .
▶ Symmetric, positive-definite matrices always have positive,
real eigenvalues.
Some theorems

▶ The eigenvectors x1 , x2 . . . xn of a n × n matrix A with n

distinct eigenvalues are linearly independent → why?
▶ Given a matrix A ∈ Rm×n we can show that AT A ∈ Rn×n is a
symmetric, positive-definite matrix when the rank of A = n.
Why is this true? Clearly AT A is a symmetric matrix and it is
positive definite since x T AT Ax = ∥Ax ∥2 > 0 ∀x ∈ Rn \ 0
since the nullspaces of AT A and A are the same, and A is a
full column rank matrix.
▶ The matrix AT A is important in machine learning since it
figures in the least-squares solution to a data matrix
represented as A where n represents the number of features
and m is the number of data vectors.
Spectral theorem

Theorem: If A ∈ Rn×n is symmetric there exists an orthonormal

basis of the corresponding vector space V consisting of the
eigenvectors of A, and each eigenvalue is real.
Proof: We will not attempt a full proof of this theorem but
provide some intuitions about why it is true. The theorem relies on
the following three statements, shown in the next slide.
Spectral theorem

▶ All roots of the characteristic polynomial pA (λ) are real.

▶ For each eigenvalue λ we can compute an orthonormal basis
for its eigenspace. We can string together the orthonormal
bases for the different eigenvalues of A to come up with the
vectors v1 , v2 ...
▶ The dimension of the eigenspace Eλ , called its geometric
multiplicity, is the same as the algebraic multiplicity of λ
which is the number of times λ appears as a root of the
characteristic polynomial.
▶ All the basis vectors from the different Eigenspaces combine
to provide an orthonormal basis for Rn .
Complex vectors

▶ In the old formulation with real vectors, length-squared

according to the Euclidean norm was x12 + x22 + . . . xn2 . If the
xi are complex we should take length-squared to be
|x1 |2 + |x2 |2 + . . . |xn |2 where |.| denotes modulus. For the
complex number a + bi,
p √ the modulus is
(a + bi)(a − bi) = a2 + b 2
▶ For complex vectors we would like to preserve the idea as
possible that ∥x∥2 = x T x . If we keep the old definition of
inner product for complex vectors we will not get a real
number as length as shown in the next bullet.

▶ Let x =
1+i
. We have
2+i
x T x = (1 + i)2 + (2 + i)2 = 1 + 2i + i 2 + 4 + 4i + i 2 = 6i + 3.
Hermitian matrices

▶ We modify the inner product between two complex vectors x

and y to x H y , where x H = x T .
▶ Now x H x = x1 x1 + . . . xn xn = ∥x∥2 according to the new
definition of length.
▶ A Hermitian matrix is a generalization of a symmetric matrix.
▶ Instead of requiring AT = A, we say a matrix is Hermitian if
it is equal to its conjugate-transpose, ie A is a Hermitian
matrix if AH = A or A = A
T

▶ As an example consider the matrix A =
1 3−i
. It is a
3+i 4

Hermitian matrix since AH = = A.
1 3−i
3+i 4
Spectral theorem

We shall now show that all eigenvalues for a symmetric matrix are
real. Let Ax = λx . Then premultiplying with x H on both sides we
have x H Ax = λx H x
Now x H Ax is a 1 × 1 matrix. Taking the Hermitian of this matrix
we have (x H Ax )H = x H AH x = x H Ax , so the Hermitian of the
matrix is itself which means that the matrix is real.
On the right hand side we note that x H x is real, so this means
that λ must be real.
Spectral theorem

Let us show that eigenvectors belonging to different eigenvalues

are orthogonal. Let Ax = λx and Ay = µy . Then we have

y H Ax = λy H x
x H Ay = µx H y

But x H Ay = (y H AH x )H = (y H Ax )H = λx H y . We already know

that x H Ay = µx H y . This means λx H y = µx H y . Since λ ̸= µ,
this must mean x H y = 0.
This shows that eigenvectors corresponding to different eigenvalues
are orthogonal.
Spectral theorem

▶ So we see that the eigenvalues of a symmetric matrix are real

and eigenvectors belonging to different eigenvalues are
orthogonal.
▶ This suggests that one can string together all the orthonormal
bases for the different eigenvalues and get an orthonormal
basis for Rn .
▶ But who is to say that when we string together the basis
vectors for all the eigenvalues, we will have enough vectors to
describe Rn ? We need n basis vectors and might end up
having fewer than n vectors.
▶ If the eigenvalues are all different, we can see that we will
indeed have enough basis vectors. But what about when there
are repeating eignevalues?
Spectral theorem

▶ We need one more piece to complete the puzzle and show

that we will have enough eigenvectors to complete the
orthonormal basis - this part we shall not prove!
▶ As a consequence of the spectral theorem we can write a real
symmetric matrix A as A = Q ΛQ T where Q is an
orthonormal basis (think orthonormal basis vectors for
example), and Λ is a diagonal matrix consisting of non-zero
entries only along the diagonal.
▶ The spectral theorem can be used in a machine learning
context since we can take the data matrix A and create a
symmetric matrix out of it - AT A and AAT which are both
used in Singular-Value Decomposition and PCA.
Trace and eigenvalues

▶ We can show that the sum of the eigenvalues of a matrix is

equal to the trace of the matrix, i.e i=n
P Pi=n
i=1 λi = i=1 aii . To
see why this is true, noteQ that the characteristic polynomial
pA (λ) can be written as i=n i=1 (λi − λ). The coefficient to
n−1 n−1
Pi=n
λ in this expansion is (−1) i=1 λi . Early on in this
lecture we showed from a direct expansion Pi=n of the determinant
that the coefficient of λ n−1 is (−1) n−1
i=1 aii . Thus we
have our result.
▶ The product of all eigenvalues is the determinant of the
matrix, i.e det(A) = i=1
Qi=n
λi . To see why this is true, let us
once again look at the factorisation of pA (λ) as
det(A − λI ) = pA (λ) = i=1
Qi=n
(λi − λ). Setting λ = 0 in this
equation gives the result.
Cholesky decomposition

Theorem A symmetric, positive-definite matrix A can be

factorized into a product A = LLT where L is a lower-triangular
matrix with positive elements.
For an example 3 × 3 matrix we can write
    
a11 a12 a13 l11 0 0 l11 l21 l31
a21 a22 a23  = l21 l22 0   0 l22 l32 
a31 a32 a33 l31 l32 l33 0 0 l33

We can solve for the q elements of theqlower triangular matrix to get

√ 2 , l 2 + l 2 ).
l11 = a11 , l22 = a22 − l21 33 = a33 − (l31 32
For the elements below the diagonal we have l21 = al1121 , l31 = al1131
and l32 = a32 −l31 l21
l22 .
An application of Cholesky decomposition

▶ In the Data Science / Machine Learning context, distributions

on data are frequently multivariate Gaussian.
▶ Multivariate Gaussian distributions are governed by a
covariance matrix which is symmetric, positive-definite.
▶ We may need to draw samples from such distributions which
is where the Cholesky decomposition finds an important
application.
▶ To generate a sample from a multivariate Gaussian
distribution, we factor the covariance matrix into its Cholesky
factor L, generate a Gaussian random vector x on
independent variables which is easy to do, and compute Lx
which will be a sample according to the required distribution.
Lecture 5

Math Foundations Team

Introduction

▶ In the previous lecture, we discussed eigenvalues and

eigenvectors of matrices
▶ In this lecture, we will look at two related methods for
factorizing matrices into canonical forms.
▶ The first one is known as Eigenvalue decompostion. It uses
the concepts of eigenvalues and eigenvectors to generate the
decomposition
▶ The second method known as singular value decomposition or
SVD is applicable to all matrices
Diagonal Matrices

▶ A diagonal matrix is a matrix that has value zero on all off

diagonal elements.
 
d1
D=
 .. 
. 
dn

▶ For a diagonal matrix D, the determinant is the product of its

diagonal entries.
▶ A matrix power Dk is given by each diagonal element raised
to the power k.
▶ Inverse of a diagonal matrix is obtained by taking inverse of
non-zero diagonal entry.
Diagonalizable Matrices

▶ A matrix A ∈ Rn×n is diagonalizable if there exists an

invertible matrix P ∈ Rn×n and a diagonal matrix D such that
D = P−1 AP
▶ In the definition of diagonalization, it is required that P is an
invertible matrix. Assume p1 , p2 , ....., pn are the n columns of
P
▶ Rewriting we get AP = PD. By observing that D is a
diagonal matrix, we can simplify as

Api = λi pi

where λi is the i th diagonal entry in D.

Diagonalizable Matrix

▶ Consider a square matrix

1 4
A=
2 3

▶ Consider the invertible matrix

−2 1
P=
1 1

▶ Now consider the product P−1 AP as follows

−1
−2 1 1 4 −2 1 −1 0
. . =
1 1 2 3 1 1 0 5
Eigendecomposition of a matrix

▶ Recall the existence of eigenvalues and eigenvectors for square

matrices
▶ Eigenvalues can be used to create a matrix decomposition
known as Eigenvalue Decomposition
▶ A square matrix A ∈ Rn×n can be factored into

A = PDP−1

▶ where P is an invertible matrix of eigenvectors of A assuming

we can find n eigenvectors that form a basis of Rn
▶ and D is a diagonal matrix whose diagonal entries are the
eigenvalues of A
Example of Eigendecomposition

Let us compute the eigendecomposition of the matrix A

2.5 −1
A=
−1 2.5

▶ Step 1: Find the eigenvalues and eigenvectors

2.5 − λ −1
A − λI =
−1 2.5 − λ
▶ The characteristic equation is given by det(A − λI) = 0
▶ This leads to the equation λ2 − 5λ + 214 =0
▶ Solving the quadratic equation gives us λ1 = 3.5 and λ2 = 1.5
Example of Eigendecomposition

▶ The eigenvector
" 1 # corresponing to λ1 = 3.5 is derived as
− √2)
p1 = 1√
2
▶ The eigenvector
" 1 # corresponing to λ1 = 1.5 is derived as
√
p2 = 2
√1
2
▶ Step 2 : Construct the matrix P to diagonalize A
" 1 1
#
√ √
P= 2 2
− √12 √1
2
Example of Eigendecomposition

▶ The inverse of matrix P is given by

" 1 #
−1
√
2
− √12
P = √1 √1
2 2

▶ The eigendecompostion of the matrix A is given by

" 1 # " √1 #
√ √1 3.5 0 − √1
A= 2 2 2 2
− √12 √12 0 1.5 √12 √1
2

▶ In summary we have obtained the required matrix

factorization using eigenvalues and eigenvectors.
Symmetric Matrices and Diagonalizability

▶ Recall that a matrix A is called symmetric matrix if A = AT

−2 1
A=
1 1
▶ A Symmetric matrix A ∈ Rn×n can always be diagonalized.
▶ This follows directly from the spectral theorem discussed in
previous lecture
▶ Moreover the spectral theorem states that we can find an
orthogonal matrix P of eigenvectors of A.
Motivation for Singular Value Decomposition

▶ The singular value decomposition or (SVD) of a matrix is a

central matrix decomposition method in linear algebra.
▶ The eigenvalue decomposition is applicable to square matrices
only.
▶ The singular value decomposition exists for all rectangular
matrices
▶ SVD involves writing a matrix as a product of three matrices
U, Σ and VT .
▶ The three component matrices are derived by applying
eigenvalue decomposition discussed previously
Singular Value Decomposition Theorem

▶ Let A ∈ Rm×n be a rectangular matrix. Assume that A has rank r .

▶ The Singular value decomposition of A is defined as

A = UΣVT

▶ U ∈ Rm×m is an orthogonal matrix with column vectors ui where

i = 1, ..., m
▶ V ∈ Rn×n is an orthogonal matrix with column vectors vj where
j = 1, ..., n
▶ Σ is an m Ö n matrix with Σii = σi > 0
▶ The diagonal entries σi , i = 1, ..., r of Σ are called the singular
values.
▶ By convention, the singular values are ordered i.e Σii > Σjj where
i < j.
Properties of Σ

▶ The singular value matrix Σ is unique.

▶ Observe that the Σ ∈ Rm×n matrix is rectangular. In
particular, Σ is of the same size as A.
▶ This means that Σ has a diagonal submatrix that contains the
singular values and needs additional zero padding.
▶ Specifically, if m > n, then the matrix Σ has diagonal
structure up to row n and then consists of zero rows.
▶ If m < n, the matrix Σ has a diagonal structure up to column
m and columns that consist of 0 from m + 1 to n.
Construction of V

▶ It can be observed that

AT A = VΣT ΣVT

▶ Since AT A has the following eigendecomposition

AT A = PDPT

▶ Therefore, the eigenvectors of AT A that compose P are the

right-singular vectors V of A.
▶ The eigenvalues of AT A are the squared singular values of Σ
Construction of U

▶ It can be observed that

AAT = UΣVT VΣT UT

▶ Since AAT has the following eigendecomposition

AAT = SDST
▶ Therefore, the eigenvectors of AAT that compose S are the
left-singular vectors U of A
Construction of U continued

▶ A = UΣVT can be rearranged to obtain a simple formulation

for ui
▶ By postmultiplying by V we get AV = UΣVT V
▶ By observing that V is orthogonal we obtain a simple form

AV = UΣ

▶ This is equivalent to the following

1
ui = Avi ∀i = 1, 2, ..., r
σi
Computing Singular Value Decomposition 1

▶ We want to find SVD of the following rectangular matrix A

1 0 1
A=
−2 1 0

▶ Let us consider the matrix AT A derived from A given by

 
5 −2 1
AT A = −2 1 0
1 0 1

▶ It is a symmetric matrix
Computing Singular Value Decomposition 2

▶ Derive the eigendecomposition of AT A in the form PDP T

▶ The matrix P is given by

√5 −1
 
30
0 √
6
 √−2 √1 −2 
√
P=  30 5 6
√1 √2 √1
30 5 6

▶ The matrix D is given by

 
6 0 0
D = 0 1 0
0 0 0
Computing Singular Value Decomposition 3

Now we construct the singular value matrix Σ

▶ The matrix Σ has the dimension same as A. In this case Σ is
hence a 2 × 3 matrix.
▶ The diagonal entries of submatrix is obtained by taking square
root of 6 and 1 respectively
▶ Singular-value matrix Σ is given by:
√
6 0 0
Σ=
0 1 0
▶ The last column is a column of zeros only
Computing Singular Value Decomposition 4

Left singular vectors as the normalized image of the right singular

vectors. Recall that ui = σ1i Avi
▶ The first vector " 1 #
1 √
u1 = Av1 = √ 5
−2
σ1 5
▶ The second vector
" #
1 √2
u2 = Av2 = 5
σ2 √1
5
Final Step : Combining U, Σ and V

We compile all the three matrices together to generate the SVD

▶
−1 T
 5 
" 1 2
# √ √
30
0 √ 6
√ √

5 5 6 0 0  √−2 √1 √ −2 
A= √ −2 √1
 30 5 6
5 5
0 1 0 1 2 1
√ √ √
30 5 6
▶ The matrix U is an 2 × 2 matrix satifying orthogonality
property.
▶ The matrix V is an 3 × 3 matrix satifying orthogonality
property.
Comparing SVD and EVD

▶ The left-singular vectors of A are eigenvectors of AAT

▶ The right-singular vectors of A are eigenvectors of AT A
▶ The non-zero singular values of A are the square roots of the
nonzero eigenvalues of AT A.
▶ The SVD always exists for any matrix in Rm×n
▶ The eigendecomposition is only defined for square matrices in
Rn×n and only exists if we can find a basis of eigenvectors of
Rn
Comparing SVD and EVD

▶ The vectors in the eigendecomposition matrix P are not

necessarily orthogonal.
▶ On the other hand, the vectors in the matrices U and V in the
SVD are orthonormal.
▶ Both the eigendecomposition and the SVD are compositions
of three linear mappings:
▶ A key difference between the eigendecomposition and the
SVD is that in the SVD, domain and codomain can be of
different dimensions
▶ In the SVD, the left and right singular vector matrices P and
P are generally not inverse of each other.
Comparing SVD and EVD 3

▶ In the eigendecomposition, the matrices in decomposition are

inverse of each other
▶ In the SVD, the entries in the diagonal matrix Σ are all real
and nonnegative,
▶ In eigendecomposition diagonal matrix entries need not be
real always.
▶ The leftsingular vectors of A are eigenvectors of AAT
▶ The rightsingular vectors of A are eigenvectors of AT A .
Matrix Approximation

▶ We considered the SVD as a way to factorize A = UΣVT into

the product of three matrices, where U and V are orthogonal
and Σ contains the singular values on its main diagonal.
▶ Instead of doing the full SVD factorization, we will now
investigate how the SVD allows us to represent a matrix A as
a sum of simpler matrices Ai
▶ This representation which lends itself to a matrix
approximation scheme that is cheaper to compute than the
full SVD.
Matrix Approximation continued

▶ A matrix A ∈ Rm×n of rank r can be written as a sum of

rank-1 matrices so that A = ri=1 σi ui viT
P

▶ The diagonal structure of the singular value matrix Σ

multiplies only matching left and right singular vectors ui viT
and scales them by the corresponding singular value σi .
▶ All terms σi ui viT vanish for i ̸= j because Σ is a diagonal
matrix.
▶ Any term for i > r would vanish because the corresponding
singular value is 0.
Rank k Approximation

▶ We summed up the r individual rank-1 matrices to obtain a

rank r matrix A.
▶ If the sum does not run over all matrices Ai i = 1, ..., r but
only up to an intermediate value k we obtain a rank-k
approximation
▶ The approximation represented by Â(k) is defined as follows

k
X
Â(k) = σi ui viT
i=1

▶ To measure the difference between A and its rank-k

approximation we need the notion of a norm which is
introduced next
Spectral Norm of a matrix

▶ We introduce the notation of a subscript in the matrix norm

▶ Spectral Norm of a Matrix. For x ∈ Rn , x ̸= 0, the spectral
norm norm of a matrix A ∈ Rm×n is defined as
∥Ax∥2
∥A∥2 = max
x ∥x∥2

where ∥y ∥2 is the euclidean norm of y

▶ Theorem : The spectral norm of a matrix A is its largest
singular value
Example : Spectral Norm of a matrix

▶ Example : Consider the following matrix A

1 2
A=
3 4

▶ Singular value decomposition of this matrix will provide the

matrix Σ as follows

5.465 0
Σ=
0 0.366

▶ The 2 singular values are 5.4650 and 0.366.

▶ By definition the spectral norm is the largest singular value.
▶ Hence, the spectral norm is 5.4650
Lecture 6

Math Foundations Team

Introduction

Many algorithms in machine learning optimize an objective

function with respect to a set of desired model parameters that
control how well a model explains the data: Finding good
parameters can be phrased as an optimization problem.
Examples include: linear regression, where we look at curve-fitting
problems and optimize linear weight parameters to maximize the
likelihood; neural-network auto-encoders for dimensionality
reduction and data compression.
Differentiation of Univariate Functions

For h > 0, the derivative of f at x is defined as the limit

df f (x + h) − f (x)
= lim (1)
dx h→0 h
The derivative of f points in the direction of steepest ascent of f .
Derivative of a Polynomial

To compute the derivative of f (x) = x n n ∈ N using the definition

df f (x + h) − f (x)
= lim
dx h→0 h
(x + h)n − x n
= lim
h→0 h
Pn n n−i i (2)
h − xn

i=0 i x
= lim
h→0 h
Pn n n−i i

x h
= lim i=1 i
h→0 h
Derivative of a Polynomial

n
df X n n−i i−1
= lim x h
dx h→0 i
i=1
n
(3)

n n−1 X n n−i i−1
= lim x + lim x h
h→0 1 h→0 i
i=2
n−1
= nx
Taylor polynomial

The Taylor polynomial is a representation of a function f as an

finite sum of terms. These terms are determined using derivatives
of f evaluated at x0 .
Definition: The Taylor polynomial of degree n of f : R → R at x0
is defined as
n
X f (k) (x0 )
Tn (x) = (x − x0 )k (4)
k!
k=0

where f (k) (x 0) is the kth derivative of f at x0 which we assume

exists.
Taylor series

Definition: The Taylor series of smooth (continuously

differentiable infinite many times) function f : R → R at x0 is
defined as
∞
X f (k) (x0 )
T∞ (x) = (x − x0 )k (5)
k!
k=0

For x0 = 0, we obtain the Maclaurin series as a a special instance

of the Taylor series.
Remark: In general, a Taylor polynomial of degree n is an
approximation of a function, which does not need to be a
polynomial. The Taylor polynomial is similar to f in a
neighborhood around x0 . However, a Taylor polynomial of degree n
is an exact representation of a polynomial f of degree k ≤ n since
all derivatives f (i) = 0, for i > k.
Taylor Polynomial example

Consider the polynomial f (x) = x 4 . Find the Taylor polynomial T6

evaluated at x0 = 1.
We compute f (k) (1) for k = 0, 1, 2..., 6
′ ′′
f (1) = 1, f (1) = 4, f (1) = 12, f (3) (1) = 24, f (4) (1) = 24,
f (5) (1) = 0, f (6) (1) = 0. The desired Taylor polynomial is
6
X f (k) (x0 )
T6 (x) = (x − x0 )k
k!
k=0
= 1 + 4(x − 1) + 12(x − 1)2 + 24(x − 1)3 + 24(x − 1)4
= x 4 = f (x)
(6)

we obtain an exact representation of the original function.

Taylor Series example

Consider the smooth function f (x) = sin(x) + cos(x). We compute

Taylor series expansion of f at x0 = 0, which is the Maclaurin
series expansion of f . We obtain the following derivatives:
f (0) = sin(0) + cos(0) = 1
′
f (0) = cos(0) − sin(0) = 1
′′
f (0) = −sin(0) − cos(0) = −1
f (3) (0) = −cos(0) + sin(0) = −1
f (4) (0) = sin(0) + cos(0) = f (0) = 1
The coefficients in our Taylor series are only ±1 (since sin(0) = 0),
each of which occurs twice before switching to the other one.
Furthermore, f (k+4) (0) = f k (0)
Taylor Series example

Therefore, the full Taylor series expansion of f at x0 = 0 is given by

∞
X f (k) (x0 )
T∞ (x) = (x − x0 )k
k!
k=0
1 2 1 1 1
=1+x − x − x3 + x4 + x5 − . . .
2! 3! 4! 5!
1 1 1 1 (7)
= 1 − x 2 + x 4 ∓ . . .x − x 3 + x 5 ∓ . . .
2! 4! 3! 5!
∞ ∞
X 1 X 1
= (−1)k x 2k + (−1)k x 2k+1
(2k)! (2k + 1)!
k=0 k=0
= cos(x) + sin(x)
Differentiation Rules

′
We denote the derivative of f by f
′ ′ ′
▶ Product Rule: (f (x)g (x)) = f (x)g (x) + f (x)g (x)
′ ′ ′
▶ Sum Rule: (f (x) + g (x)) = f (x) + g (x)
′ ′
′
▶ Quotient Rule: ( gf (x)
(x) ) =
f (x)g (x)−f (x)g (x)
(g (x))2
′ ′ ′ ′
▶ Chain Rule: (g (f (x)) = (g ◦ f ) (x) = g (f (x))f (x)
Example: Chain Rule

Compute the derivative of function h(x) = (2x + 1)4

h(x) = (2x + 1)4 = g (f (x))
f (x) = 2x + 1,
g (f ) = f 4
Derivatives of f and g are
′
f (x) = 2
′
g (f ) = 4f 3
′ ′ ′
h (x) = g (f )f (x) = (4f 3 ).2 = 8(2x + 1)3
Partial Differentiation and Gradients

Differentiation applies to functions f of a scalar variable x ∈ R. In

the following, we consider the general case where the function f
depends on one or more variables x ∈ R n , e.g.,f (x) = f (x1 , x2 ).
The generalization of the derivative to functions of several
variables is the gradient. We find the gradient of the function f
with respect to x by varying one variable at a time and keeping the
others constant. The gradient is then the collection of these partial
derivatives.
Partial derivatives and Gradients

Definition: For a function f : Rn → R, x → f (x), x ∈ R n of n

variables x1 , . . . , xn we define the partial derivatives as

∂f f (x1 + h, x2 , . . . , xn ) − f (x1 , x2 , . . . , xn )
= lim
∂x1 h→0 h
∂f f (x1 , x2 + h, . . . , xn ) − f (x1 , x2 , . . . , xn )
= lim
∂x2 h→0 h
..
.
∂f f (x1 , x2 , . . . , xn + h) − f (x1 , x2 , . . . , xn )
= lim
∂xn h→0 h
We collect them in the row vector called the gradient of f or
Jacobian
df h i
∆x f = gradf = = ∂f∂x(x) , ∂f (x)
∂x2 , . . . , ∂f (x)
∂xn
(8)
dx 1

Example 1: Find the partial derivatives of f (x, y ) = (x + 2y 3 )2

∂f (x, y ) ∂(x + 2y 3 )
= 2(x + 2y 3 ) = 2(x + 2y 3 ) (9)
∂x ∂x
∂f (x, y ) ∂(x + 2y 3 )
= 2(x + 2y 3 ) = 12y 2 (x + 2y 3 ) (10)
∂y ∂y
here we used the chain rule to compute the partial derivatives.
Example 2

Find the partial derivatives of f (x1 , x2 ) = x12 x2 + x1 x23

∂f (x1 , x2 )
= 2x1 x2 + x23 (11)
∂x1
∂f (x1 , x2 )
= x12 + 3x1 x22 (12)
∂x2
So the gradient is then

df ∂f (x1 , x2 ) ∂f (x1 , x2 )
=[ , ] = [2x1 x2 + x23 , x12 + 3x1 x22 ] ∈ R1×2
dx ∂x1 ∂x2
(13)
Basic rules of partial differentiation

When we compute derivatives with respect to vectors x ∈ Rn we

need to pay attention: Our gradients now involve vectors and
matrices, and matrix multiplication is not commutative i.e., the
order matters.

∂ ∂f ∂g
Product rule: (f (x)g (x)) = g (x) + f (x) (14)
∂x ∂x ∂x
∂ ∂f ∂g
Sum rule: (f (x) + g (x)) = + (15)
∂x ∂x ∂x
∂ ∂ ∂g ∂f
chain rule: (g ◦ f )(x) = (g (f (x)) = (16)
∂x ∂x ∂f ∂x
Chain Rule

Consider a function f : R → R of two variables x1 , x2 .

Furthermore, x1 (t) and x2 (t) are themselves functions of t.

To compute the gradient of f with respect to t, we need to apply

the chain rule for multivariate functions as
 
∂x1 (t)
df h
∂f ∂f
i
 ∂t  ∂f ∂x1 ∂f ∂x2
= ∂x1 ∂x2 = + (17)
dt ∂x2 (t) ∂x1 ∂t ∂x2 ∂t
∂t

where d denotes the gradient and ∂ partial derivatives.

Example

Consider f (x1 , x2 ) = x12 + 2x2 , where x1 = sin t and x2 = cos t then

df ∂f ∂x1 ∂f ∂x2
= +
dt ∂x1 ∂t ∂x2 ∂t
∂ sin t ∂ cos t
= 2 sin t +2
∂t ∂t
= 2 sin t cos t − 2 sin t = 2 sin t(cos t − 1)

is the corresponding derivative of f with respect to t.

If f (x1 , x2 ) is a function of x1 and x2 , where x1 (s, t) and x2 (s, t)
are themselves functions of two variables s and t, the chain rule
yields the partial derivatives:
∂f ∂f ∂x1 ∂f ∂x2
= + (18)
∂s ∂x1 ∂s ∂x2 ∂s
∂f ∂f ∂x1 ∂f ∂x2
= + (19)
∂t ∂x1 ∂t ∂x2 ∂t
and the gradient is obtained by the matrix multiplication
df ∂f ∂x
=
d(s, t) ∂x ∂(s, t)
i ∂x1 ∂x1
" #
h
∂f ∂f ∂s ∂t
= ∂x 1 ∂x2 ∂x2 ∂x2
∂s ∂t
Gradients of Vector-Valued Functions

We have discussed partial derivatives and gradients of functions

f : Rn → R mapping to the real numbers. Now we will generalize
the concept of the gradient to vector-valued functions
f : Rn → Rm , where n ≥ 1 and m > 1.
For a function f : Rn → Rm and a vector x = [x1 , . . . , xn ]T
corresponding vector of function values is given as
 
f1 (x)
f (x) =  ...  ∈ Rm (20)
 

fm (x)
where each fi : Rn → R
Gradients of Vector-Valued Functions

Therefore, the partial derivative of a vector-valued function

f : Rn → Rm w.r.t. xi ∈ R, i = 1, . . . n is given as the vector

 ∂f1 
∂x
i
∂f
=  ... 
 
∂xi ∂f m
∂xi
 f1 (x1 ,...,xi−1 ,xi +h,xi+1 ,...,xn )−f1 (x) 
limh→0 h
.. m
= ∈R
 
.
fm (x1 ,...,xi−1 ,xi +h,xi+1 ,...,xn )−fm (x)
limh→0 h
Gradients of Vector-Valued Functions

We know that the gradient of f with respect to a vector is the row

∂f
vector of the partial derivatives. Every partial derivative ∂x i
is itself
a column vector. Therefore, we obtain the gradient of
f : Rn → Rm with respect to x ∈ Rn by collecting these partial
derivatives:
df (x) h ∂f (x) i
= ∂x1 . . . ∂f∂x(x)
dx 
n

∂f1 (x) ∂f1 (x)
∂x1 . . . ∂xn
=
 ..  ∈ Rm×n

 . 
∂fm (x) ∂fm (x)
∂x1 . . . ∂xn
Example 1: Gradients of Vector-Valued Functions

Given f (x) = Ax, f (x) ∈ RM , A ∈ RM×N , x ∈ RN

Since f : RN → RM , it follows that df /dx ∈ RM×N . To compute
the gradient we determine the partial derivatives of f w.r.t xj :

N
X ∂fi
fi (x) = Aij xj =⇒ = Aij (21)
∂xj
i=1

We obtain the gradient using Jacobian

 ∂f1 ∂f1   
. . . ∂x
∂x1 N
A11 . . . A1N
df .. .. M×N
= = =A∈R (22)
   
dx . .
∂fM ∂fM
∂x . . . ∂x
AM1 . . . AMN
1 N
Example 2: Gradients of Vector-Valued Functions

Consider the function h : R → R, h(t) = (f ◦ g )(t) with

f (x) = exp(x1 x22 )

x1 t cos t
x= = g (t) = (23)
x2 t sin t
and compute the gradient of h w.r.t. t. Since f : R2 → R and
g : R → R2 we note that
∂f ∂g
∈ R1×2 and ∈ R2×1 (24)
∂x ∂t
The desired gradient is computed by applying the chain rule:
i ∂x1
" #
dh ∂f ∂x h
∂f ∂f ∂t
= = ∂x ∂x2
dt ∂x ∂t 1 ∂x2
∂t

2 2 2
cos t − t sin t
= exp(x1 x2 )x2 2exp(x1 x2 )x1 x2
sin t + t cos t
= exp(x1 x22 )(x22 (cos t − t sin t) + 2x1 x2 (sin t + t cos t))

where x1 = t cos t and x2 = t sin t;

Lecture 7

Math Foundations Team

Introduction

▶ In last lecture, we discussed about differentiation of univariate

functions, partial differentiation, gradients and gradients of
vector valued functions.
▶ Now we will look into gradients of matrices and some useful
identities for computing gradients.
▶ Finally, we will discuss back propagation and automatic
differentiation.
Gradients of Matrices

The gradient of an m × n matrix A with respect to a p × q matrix

B, the resulting Jacobian would be an (m × n) × (p × q), i.e., a
four-dimensional tensor J, whose entries are given as
∂Aij
Jijkl =
∂Bkl

Since, we can consider Rm×n as Rmn , we can shape our matrix

into vectors of length mn and pq respectively. The gradient using
mn vectors results in a Jacobian of size mn × pq
Gradients of Matrices
Gradients of Matrices

Let f = Ax where A ∈ Rm×n , and x ∈ Rn , then

∂f
∈ Rm×(m×n)
∂A
By definition  ∂f 
1
∂A
∂f   ∂fi
=  ...  , ∈ R1×(m×n)
∂A ∂f
∂A
m
∂A
Gradients of Matrices

Now, we have
n
X
fi = Aij xj , i = 1, · · · , m.
j=1

Therefore, by taking partial derivatives with respect to Aiq

∂fi
= xq .
∂Aiq

Hence, i th row becomes

∂fi
= x T ∈ R1×1×n
∂Ai,:
∂fi
= 0T ∈ R1×1×n , fork ̸= i
∂Ak,:
Hence, by stacking the partial derivatives, we get
 T
0
 .. 
 . 
∂fi  T 1×m×n
=
∂Ak,: 
x ∈R
 .. 
 . 
0T
Gradients of Matrices with respect to Matrices

Let B ∈ Rm×n and f : Rm×n → Rn×n with

f (B) = B T B =: K ∈ Rn×n

Then, we have
∂K
∈ R(n×n)×(m×n) .
∂B
Moreover
∂Kpq
∈ R1×(m×n) , forp, q = 1, · · · n
∂B
where Kpq is the (p, q)th entry of K = f (B)
Gradients of Matrices with respect to Matrices

Let i th column of B be bi , then

m
X
Kpq = rpT rq = Blp Blq
l=1

Computing the partial derivative, we get

m
∂Kpq X ∂
= Blp Blq = ∂pqij
∂Bij ∂Bij
l=1
Gradients of Matrices with respect to Matrices

Clearly, we have

∂pqij = Biq if j = p, p ̸= q
∂pqij = Bip if j = q, p ̸= q
∂pqij = 2Biq if j = p, p = q
∂pqij = 0 otherwise

where p, q, j = 1, · · · , n i = 1, · · · , m
Useful Identities for Computing Gradients

∂ T ∂f (X ) T
▶ ∂X f (X ) = ( ∂X )
∂ ∂f (X )
▶ ∂X tr (f (X )) = tr ( ∂X )
▶ ∂ −1 ∂f (X ) )
∂X det(f (X )) = det(f (X ))tr (f (X ) ∂X
∂ −1 −1 ∂f (X ) −1
▶ ∂X f (X ) = −f (X ) ∂X f (X )
Useful Identities for Computing Gradients

∂aT X −1 b
▶ ∂X = −(X −1 )T ab T (X −1 )T
▶ ∂x T a T
∂x = a
▶ ∂aT x T
∂x = a
T
∂a Xb
▶ T
∂X = ab
▶ ∂x T B T T
∂x = x (B + B )
∂ T
▶ ∂s (x − As) W (x − As) = −2(x − As)T WA
for symmetric W .
Backpropagation and Automatic Differentiation

Consider the function

q
f (x) = x 2 + exp(x 2 ) + cos(x 2 + exp(x 2 ))

Taking dervatives

df 2x + 2xexp(x 2 )
= p − sin(x 2 + exp(x 2 ))(2x + 2xexp(x 2 ))
dx 2 x 2 + exp(x 2 )
1
= 2x( p − sin(x 2 + exp(x 2 )))(1 + exp(x 2 ))
2 x + exp(x 2 )
2
Motivation

▶ The implementation of the gradient could be significantly

more expensive than computing the function, which imposes
unnecessary overhead where we get such lengthy expressions.
▶ We need an efficient way to compute the gradient of an error
function with respect to the parameters of the model.
▶ For training deep neural network models, the backpropagation
algorithm is one such method.
Backpropagation and Automatic Differentiation

In neural networks with multiple layers

fi (xi−1 ) = σ(Ai−1 xi−1 + bi−1 )

where xi−1 is the output of layer i − 1 and σ is an activation

function.
Backpropagation

To train these model, the gradient of the loss function L with

respect to all model parameters θj = {Aj , bj }, j = 1, · · · , K and
inputs of each layer needs to be computed. Consider,

f0 := x
fi := σi (Ai−1 fi−1 + bi−1 ), i = 1, · · · , K .

We have to find θj = {Aj , bj }, j = 1, · · · , K − 1 such that

L(θ) = ||y − fK (θ, x)||2

is minimum where θ = {A0 , b0 , · · · , AK −1 , bK −1 }

Backpropagation

Using the chain rule, we get

∂L ∂L ∂fK
=
∂θK −1 ∂fK ∂θK −1
∂L ∂L fK ∂fK −1
=
∂θK −2 ∂fK fK −1 ∂θK −2
∂L ∂L fK ∂fK −1 ∂fK −2
=
∂θK −3 ∂fK fK −1 ∂fK −2 ∂θK −3
∂L ∂L fK ∂fi+2 ∂fi+1
= ···
∂θi ∂fK fK −1 ∂fi+1 ∂θi
Backpropagation

∂L
If the partial derivatives
∂θi+1 are computed, then the computation
∂L
can be reused to compute ∂θ i
.
Example

Consider the function

q
f (x) = x 2 + exp(x 2 ) + cos(x 2 + exp(x 2 ))

Let
√
a = x 2 , b = exp(a), c = a + b, d = c, e = cos(c) ⇒ f = d + e
Example

∂a
⇒ = 2x
∂x
∂b
= exp(a)
∂a
∂c ∂c
= 1=
∂a ∂b
∂d 1
= √
∂c 2 c
∂e
= −sin(c)
∂c
∂f ∂f
= 1=
∂d ∂e
Example

Thus, we have
∂f ∂f ∂d ∂f ∂e
= +
∂c ∂d ∂c ∂e ∂c
∂f ∂f ∂c
=
∂b ∂c ∂b
∂f ∂f ∂b ∂f ∂c
= +
∂a ∂b ∂a ∂c ∂a
∂f ∂f ∂a
=
∂x ∂a ∂x
Example

Substituting the results, we get

∂f 1
= 1.( √ + 1).(−sin(c))
∂c 2 c
∂f ∂f
= .1
∂b ∂c
∂f ∂f ∂f
= exp(a) + .1
∂a ∂b ∂c
∂f ∂f
= 2x
∂x ∂a

Thus, the computation for calculating the derivative is of similar

complexity as the computation of the function itself.
Formalization of Automatic Differentiation

Let x1 , · · · , xd : input variables.

xd+1 , · · · , xD−1 : intermediate variables.
xD : output variable, then we have,

xi = gi (xPa(xi ) )

Note that gi s are elementary functions and are also called as

forward propagation function and xPa(xi ) is the set of parent nodes
of variable xi in the graph.
Formalization of Automatic Differentiation

Now,
∂f
f = xD ⇒ =1
∂D
For other variables, using chain rule, we get
∂f X ∂f ∂xj X ∂f ∂gi
= =
∂xi ∂xj ∂xi ∂xj ∂xi
xj :xi ∈Pa(xj ) xj :xi ∈Pa(xj )

The last equation is the back propagation of the gradient through

the computation graph. For neural network training, we back
propagate the error of the prediction with respect to the label.
Lecture 8

Math Foundations Team

Introduction

▶ Till now we have discussed about Taylor/Maclaurian series,

Partial Derivatives and Gradients.
▶ Now we are interested in Higher order Derivatives.
▶ Multivariate Taylor Series and its uses in the expansion of a
function with multivariables.
Higher-Order Derivatives

Consider a function f : R2 → R
Notations for Higher-Order Partial Derivatives:

∂2f
∂x 2
: Second Partial Derivative of x w.r.t. x
∂nf
∂x n : nth Partial Derivative of x w.r.t. x
∂2f ∂ ∂f
∂y ∂x = ∂y ( ∂x ) : is the partial derivative obtained by first partial
differentiating with respect to x and then with respect to y.
∂2f ∂ ∂f
∂x∂y = ∂x ( ∂y ) : is the partial derivative obtained by first partial
differentiating by y and then x.
Hessian Matrix

The Hessian is the collection of all second-order partial derivatives.

If f(x, y) is a twice (continuously) differentiable function, then
∂2f ∂2f
∂x∂y = ∂y ∂x i.e., the order of differentiation does not matter, and
the corresponding Hessian matrix
∂2f ∂2f
" #
∂x 2 ∂x∂y
H= ∂2f ∂2f
∂x∂y ∂y 2

is symmetric. The Hessian is denoted as ∇2x,y f (x, y )

Linearization and Multivariate Taylor Series

The gradient ∇f of a function f is often used for a locally linear

approximation of f around x0 :

f (x) ≈ f (x0 ) + (∇x f )(x0 )(x − x0 ) (1)

Here (∇x f )(x0 ) is the gradient of f with respect to x, evaluated

atx0 . Figure illustrates the linear approximation of a function f at
an input x0 . The original function is approximated by a straight
line.
Linearization and Multivariate Taylor Series...

This approximation is locally accurate, but the farther we move

away from x0 the worse the approximation gets. Equation (1) is a
special case of a multivariate Taylor series expansion of f at x0 ,
where we consider only the first two terms. We discuss the more
general case in the following, which will allow for better
approximations.
Multivariate Taylor Series

Consider a function f : RD → R, x → f (x),

x ∈ RD

that is smooth at x0 . When we define the difference vector

δ := x − x0 the multivariate Taylor series of f at (x0 ) is defined as
multivariate Taylor series
∞
X D k f (x0 )
x
f (x) = δk (2)
k!
k=0

where Dxk f (x0 )

is the k th (total) derivative of f with respect to x,
evaluated at x0 .
Taylor Polynomial

The Taylor polynomial of degree n of Taylor polynomial f at x0

contains the first n + 1 components of the series in (2) and is
defined as
n
X Dxk f (x0 ) k
Tn (x) = δ (3)
k!
k=0

In (2) and (3), we used the slightly sloppy notation of δ k , which is

not defined for vectors
x ∈ RD ,
D > 1, and k > 1. Note that both Dxk f and δ k are k th order
tensors, i.e., k-dimensional arrays.
Taylor Polynomial...
Taylor Polynomial...
.
Taylor Polynomial...
In general, we obtain the terms in the Taylor series, where
Dxk f (x0 )δ k contains k th order [Link] that we defined the
Taylor series for vector fields, let us explicitly write down the first
terms Dxk f (x0 )δ k of the Taylor series expansion for
Taylor Series Expansion of a Function with Two Variables

Consider the function f (x, y ) = x 2 + 2xy + y 3 .

We want to compute the Taylor series expansion of f at
(x0 , y0 ) = (1, 2).
Before we start, let us discuss what to expect: The function in
f (x, y ) is a polynomial of degree 3. We are looking for a Taylor
series expansion,which itself is a linear combination of polynomials.
Therefore, we do not expect the Taylor series expansion to contain
terms of fourth or higher order to express a third-order polynomial.
This means that P it shouldk be sufficient to determine the first four
terms of f (x) = ∞ k=0
Dx f (x0 ) k
k! δ for an exact alternative
representation of f (x, y ). To determine the Taylor series
expansion, we start with the constant term and the first-order
derivatives, which are given by f (1, 2) = 13
Taylor Series Expansion of a Function with Two Variables...
Taylor Series Expansion of a Function with Two Variables...

When we collect the second-order partial derivatives, we obtain the

Hessian
Taylor Series Expansion of a Function with Two Variables...

The third-order derivatives are obtained as

Taylor Series Expansion of a Function with Two Variables...

Since most second-order partial derivatives in the Hessian, are

constant, the only nonzero third-order partial derivative is
∂3f ∂3f
∂y 3
= 6 =⇒ ∂y 3 (1, 2) = 6 Higher-order derivatives and the mixed
∂3f
derivatives of degree 3 (e.g., ∂x 2 ∂y
) vanish, such that
Taylor Series Expansion of a Function with Two Variables...

which collects all cubic terms of the Taylor series. Overall, the
(exact) Taylor series expansion of f at (x0 , y0 ) = (1, 2) is

Lecture 1
No ratings yet
Lecture 1
33 pages
Lect 1-2
No ratings yet
Lect 1-2
45 pages
Aimlczc416 MFML
No ratings yet
Aimlczc416 MFML
134 pages
Lecture 0
No ratings yet
Lecture 0
46 pages
Part1 20180910.13500.1596979305.4946 PDF
No ratings yet
Part1 20180910.13500.1596979305.4946 PDF
94 pages
An Introduction To Linear Algebra: Matrices and Linear Systems
No ratings yet
An Introduction To Linear Algebra: Matrices and Linear Systems
34 pages
Linear Algebra Concepts in MAT5101
No ratings yet
Linear Algebra Concepts in MAT5101
142 pages
Mat5101 Am Lecturenotes
No ratings yet
Mat5101 Am Lecturenotes
297 pages
Linear Algebra Week1
No ratings yet
Linear Algebra Week1
42 pages
2-Math (II) - Ms and PDF
No ratings yet
2-Math (II) - Ms and PDF
378 pages
Chapter 1
No ratings yet
Chapter 1
6 pages
Linear Algebra & Ordinary Differential Equations: Instructor: Dr. Naila Amir
100% (1)
Linear Algebra & Ordinary Differential Equations: Instructor: Dr. Naila Amir
28 pages
2-Math (II) - Repaired
No ratings yet
2-Math (II) - Repaired
395 pages
Matrices Lecture Notes 2
No ratings yet
Matrices Lecture Notes 2
48 pages
Aljabar Linier - Matriks
No ratings yet
Aljabar Linier - Matriks
41 pages
Linear Equations and Matrix Operations
No ratings yet
Linear Equations and Matrix Operations
5 pages
SNU - MAT260 - Linear Algebra 1 - Amber Habib
No ratings yet
SNU - MAT260 - Linear Algebra 1 - Amber Habib
61 pages
LA-Lecture Notes by Rashid A.
No ratings yet
LA-Lecture Notes by Rashid A.
98 pages
Linear Algebra Workbook Grade 12
No ratings yet
Linear Algebra Workbook Grade 12
47 pages
(1JV734 Combined Matrices Lecture
No ratings yet
(1JV734 Combined Matrices Lecture
64 pages
Matrices: CS5691: PRML - Linear Algebra - Basics CS6015-LARP CS6464
No ratings yet
Matrices: CS5691: PRML - Linear Algebra - Basics CS6015-LARP CS6464
108 pages
Linear System Gaussian Elimination and Gauss Jordan
No ratings yet
Linear System Gaussian Elimination and Gauss Jordan
34 pages
Summary QM2 Math IB
No ratings yet
Summary QM2 Math IB
50 pages
Lecture Li
No ratings yet
Lecture Li
6 pages
1550 Lect 4
No ratings yet
1550 Lect 4
9 pages
Linear Algebra and Optimization Week 1
No ratings yet
Linear Algebra and Optimization Week 1
41 pages
Applications of Matrices
No ratings yet
Applications of Matrices
54 pages
A Glossary For Elementary Linear Algebra - Unknown
No ratings yet
A Glossary For Elementary Linear Algebra - Unknown
36 pages
Linear Algebra Summary Notes For M.SC Statistics
No ratings yet
Linear Algebra Summary Notes For M.SC Statistics
18 pages
2 Matrices and Linear Equations
100% (1)
2 Matrices and Linear Equations
34 pages
Maths 2 For Eng - Lesson 1
No ratings yet
Maths 2 For Eng - Lesson 1
9 pages
Ma2001 Cheatsheet Midterms
100% (1)
Ma2001 Cheatsheet Midterms
2 pages
A A ... A A A .. A A .. .. .. .. A A .. A: Square Matrix
No ratings yet
A A ... A A A .. A A .. .. .. .. A A .. A: Square Matrix
19 pages
جبر خطي 1
No ratings yet
جبر خطي 1
46 pages
1.3) Matrices
No ratings yet
1.3) Matrices
17 pages
Kuttler LinearAlgebra
No ratings yet
Kuttler LinearAlgebra
90 pages
Lecture Notes
No ratings yet
Lecture Notes
90 pages
Mathematics-II (MATH F112) : Linear Algebra and Complex Analysis
No ratings yet
Mathematics-II (MATH F112) : Linear Algebra and Complex Analysis
70 pages
Prelims Linear Algebra I: 1.1 Addition and Scalar Multiplication of Matrices
No ratings yet
Prelims Linear Algebra I: 1.1 Addition and Scalar Multiplication of Matrices
11 pages
APPLIED MATHEMATICS I - Chapter 2: Matrices, Determinants and Systems of Linear Equations - by Dr. Tadesse Bekeshie
100% (2)
APPLIED MATHEMATICS I - Chapter 2: Matrices, Determinants and Systems of Linear Equations - by Dr. Tadesse Bekeshie
38 pages
2.1 Operations With Matrices
100% (1)
2.1 Operations With Matrices
46 pages
Linear Algebra - Qs
100% (4)
Linear Algebra - Qs
6 pages
Topic 1
No ratings yet
Topic 1
23 pages
MATH3341 LinearAlgebra
No ratings yet
MATH3341 LinearAlgebra
71 pages
MAT133 (2024-2025) Topic 2 Linear Algebra
No ratings yet
MAT133 (2024-2025) Topic 2 Linear Algebra
12 pages
Properties of Matrices Explained
No ratings yet
Properties of Matrices Explained
25 pages
Ch6 Matrix, Determinant and System of Linear Equations
No ratings yet
Ch6 Matrix, Determinant and System of Linear Equations
114 pages
Matrix Basics for Students
No ratings yet
Matrix Basics for Students
87 pages
September 16 WithNotes
No ratings yet
September 16 WithNotes
37 pages
Maths Grade 11unit 3
No ratings yet
Maths Grade 11unit 3
9 pages
Unit 9 MATRIX
No ratings yet
Unit 9 MATRIX
11 pages
General Matrix Concepts and Eigenvectors
No ratings yet
General Matrix Concepts and Eigenvectors
36 pages
Book Linear Algebra Prof Hazem
No ratings yet
Book Linear Algebra Prof Hazem
90 pages
Sec 1 1 4 rv1
No ratings yet
Sec 1 1 4 rv1
109 pages
L06 2025 01 06
No ratings yet
L06 2025 01 06
8 pages
SY III Semester LINEAR ALGEBRA DS Module 1
No ratings yet
SY III Semester LINEAR ALGEBRA DS Module 1
19 pages
The Intriguing Vedic Universe
No ratings yet
The Intriguing Vedic Universe
154 pages
CN Merged PDF
No ratings yet
CN Merged PDF
332 pages
6E 573 18 2035 Hrs Zone 3 10C: Boarding Pass (Web Check In)
No ratings yet
6E 573 18 2035 Hrs Zone 3 10C: Boarding Pass (Web Check In)
2 pages
MAIDA Why It Is Bad
No ratings yet
MAIDA Why It Is Bad
4 pages
Learning To Play Mridanga
No ratings yet
Learning To Play Mridanga
6 pages
Namaste Narasimhaya Song Lyrics
No ratings yet
Namaste Narasimhaya Song Lyrics
7 pages
Brahma Samhita English
100% (2)
Brahma Samhita English
7 pages
Krsnaaa Give Me A Break!!!
No ratings yet
Krsnaaa Give Me A Break!!!
27 pages
Bhagavad Gita Chapter 8
No ratings yet
Bhagavad Gita Chapter 8
15 pages
CARJ Vol6 No2 P 206-212
No ratings yet
CARJ Vol6 No2 P 206-212
7 pages
Vishay Micro Measurements CEA 06 062UWA 350 - C404310
No ratings yet
Vishay Micro Measurements CEA 06 062UWA 350 - C404310
7 pages
Sociological Perspective
No ratings yet
Sociological Perspective
4 pages
MSC Final
No ratings yet
MSC Final
8 pages
THESIS SYNOPSIS 3 Integrated Handloom Centre
No ratings yet
THESIS SYNOPSIS 3 Integrated Handloom Centre
4 pages
Water and Wastewater Treatment Exam
No ratings yet
Water and Wastewater Treatment Exam
10 pages
Conduct Competency Assessment: (Written Exam)
100% (4)
Conduct Competency Assessment: (Written Exam)
9 pages
C24 - Ce - Vi Sem
No ratings yet
C24 - Ce - Vi Sem
7 pages
Electrical Engineering Quiz Solutions
No ratings yet
Electrical Engineering Quiz Solutions
23 pages
Iscan Installation Guide (v3, Mar 2020)
100% (2)
Iscan Installation Guide (v3, Mar 2020)
129 pages
Systematic Qualitative Analysis of Anions
0% (1)
Systematic Qualitative Analysis of Anions
1 page
Energy in Ecosystems
No ratings yet
Energy in Ecosystems
19 pages
Implementing Knowledge Management in Bioprocesses - ISPE - International Society For Pharmaceutical Engineering
No ratings yet
Implementing Knowledge Management in Bioprocesses - ISPE - International Society For Pharmaceutical Engineering
4 pages
A 3D Computational Model of TENS
No ratings yet
A 3D Computational Model of TENS
13 pages
Booksfer File 1324 3
No ratings yet
Booksfer File 1324 3
3 pages
1 Business Environment
No ratings yet
1 Business Environment
24 pages
CMO Olympiad Book For Class 0
No ratings yet
CMO Olympiad Book For Class 0
34 pages
1 - Equilibrium of A Particle
No ratings yet
1 - Equilibrium of A Particle
25 pages
EJC150414
No ratings yet
EJC150414
17 pages
Determining Load Resistance of Glass in Buildings: Standard Practice For
No ratings yet
Determining Load Resistance of Glass in Buildings: Standard Practice For
1 page
Simpson
No ratings yet
Simpson
2 pages
Entrepreneurship Successfully Launching New Ventures 6th Global Edition by Bruce BarringerBruce Barringer
100% (2)
Entrepreneurship Successfully Launching New Ventures 6th Global Edition by Bruce BarringerBruce Barringer
316 pages
Stephen Arroyo - Astrology Psychology and The Four Elements
96% (25)
Stephen Arroyo - Astrology Psychology and The Four Elements
99 pages
EPD-Report Steel-Bar Rev2 Compressed
No ratings yet
EPD-Report Steel-Bar Rev2 Compressed
21 pages
Gas Evolution & Separator Modeling
No ratings yet
Gas Evolution & Separator Modeling
228 pages
Kshitiz International College BBA Prospectus
No ratings yet
Kshitiz International College BBA Prospectus
12 pages
DLP Canacio Argumentative Essay
No ratings yet
DLP Canacio Argumentative Essay
9 pages
APY2602 Study Guide - Anthropology of Health
No ratings yet
APY2602 Study Guide - Anthropology of Health
136 pages
CTET CDP Class Notes MD Mustafa
No ratings yet
CTET CDP Class Notes MD Mustafa
1 page
Elephants as Historical Actors in China
No ratings yet
Elephants as Historical Actors in China
11 pages
Brain Exercises That Boost Memory PDF
No ratings yet
Brain Exercises That Boost Memory PDF
3 pages