COMPREHENSIVE STUDY GUIDE:
MAT4101
Linear Algebra and Optimization for Data Science
Target Audience: MSc Data Science
Reference Texts:
• Linear Algebra and Learning from Data, Gilbert Strang (2019) 1
• Convex Optimization, Boyd & Vandenberghe (2013) 2
• Mathematics for Machine Learning, Deisenroth et al. (2019) 3
PART I: LINEAR ALGEBRA CORE (Modules
1 & 2)
Module 1: Matrix Foundations & Subspaces
Syllabus Focus: Matrix multiplication, LU/Cholesky Decompositions, Rank, Vector Spaces,
The Four Fundamental Subspaces. 4
Key Concepts
1. The Four Fundamental Subspaces:
For any m×nm×n matrix AA with Rank rr:
o Column Space (C(A)C(A)): Dimension rr. The space of all possible
outputs bb in Ax=bAx=b.
o Row Space (C(AT)C(AT)): Dimension rr. The space spanned by the rows.
o Null Space (N(A)N(A)): Dimension n−rn−r. Solutions to Ax=0Ax=0.
o Left Null Space (N(AT)N(AT)): Dimension m−rm−r.
o Orthogonality: N(A)⊥C(AT)N(A)⊥C(AT).
2. LU Decomposition:
Factoring A=LUA=LU (Lower ×× Upper). Used to solve systems efficiently
without re-computing inverses.
3. Cholesky Decomposition:
If AA is Symmetric Positive Definite, A=LLTA=LLT. Crucial for efficient
optimization algorithms.
Module 2: Eigenvalues, SVD & PCA
Syllabus Focus: Eigen Decomposition, SVD, Principal Component Analysis (PCA), Positive
Definite Matrices. 5
Key Concepts
1. Eigenvalues (λ): Roots of det(A−λI)=0. They determine the stability and growth of
a system.
2. Singular Value Decomposition (SVD):
The “Fundamental Theorem of Data Science.” Every matrix AA decomposes into:
A=UΣVTA=UΣVT
o UU: Left Singular Vectors (Eigenvectors of AATAAT).
o ΣΣ: Diagonal Singular Values (Square roots of eigenvalues of ATAATA).
o VV: Right Singular Vectors (Eigenvectors of ATAATA).
3. PCA (Principal Component Analysis):
PCA is simply SVD applied to the centered covariance matrix. The columns
of VV (Right Singular Vectors) are the Principal Components.
Practice Problem Set: Modules 1 & 2 (25 Questions)
Topic: Decompositions (LU, Cholesky)
Q1: Perform LU Decomposition on A=(2168)A=(2618).
• Solution:
1. Pivot is 2. Multiplier l21=6/2=3l21=6/2=3.
2. Row Operation: R2←R2−3R1⇒(2105)=UR2←R2−3R1⇒(2015)=U.
3. LL stores the multiplier: L=(1031)L=(1301).
4. Result: L=(1031),U=(2105)L=(1301),U=(2015).
Q2: Solve Ax=bAx=b for the matrix in Q1 with b=(525)b=(525) using forward/back
substitution.
• Solution:
1. Solve Lc=bLc=b: (1031)c=(525)⇒c1=5,3(5)+c2=25⇒c2=10(1301
)c=(525)⇒c1=5,3(5)+c2=25⇒c2=10.
2. Solve Ux=cUx=c: (2105)x=(510)⇒5x2=10→x2=2(2015)x=(510
)⇒5x2=10→x2=2.
3. 2x1+2=5⇒2x1=3⇒x1=1.52x1+2=5⇒2x1=3⇒x1=1.5.
Answer: x=(1.5,2)x=(1.5,2).
Q3: Find the Cholesky Decomposition of A=(4222)A=(4222).
• Solution: Target A=LLTA=LLT.
l11=4=2l11=4=2.
l21=2/2=1l21=2/2=1.
l22=a22−l212=2−12=1l22=a22−l212=2−12=1.
Result: L=(2011)L=(2101).
Q4: Why must pivots be positive for Cholesky decomposition?
• Solution: Cholesky involves square roots of pivots. If a pivot is negative, the result
is imaginary, implying the matrix is not Positive Definite.
Q5: Explain the Permutation Matrix PP in PA=LUPA=LU.
• Solution: If a zero appears in a pivot position, we must swap rows. PP records these
row swaps.
Topic: Subspaces & Rank
Q6: Describe the Column Space C(A)C(A) of A=(100100)A=⎝⎛100010⎠⎞.
• Solution: It is the plane spanned by (1,0,0)(1,0,0) and (0,1,0)(0,1,0), which is
the xyxy-plane in R3R3.
Q7: Find a basis for the subspace x+y+z=0x+y+z=0.
• Solution: x=−y−zx=−y−z. Free variables y,zy,z.
Basis vectors: v1=(−1,1,0)v1=(−1,1,0) and v2=(−1,0,1)v2=(−1,0,1).
Q8: Calculate the Rank of A=(111111111)A=⎝⎛111111111⎠⎞.
• Solution: All rows are identical. Elimination leaves one non-zero row. Rank = 1.
Q9: If Rank(AA) = nn (number of columns), what is the Null Space?
• Solution: The Null Space contains only the zero vector {0}{0}. Columns are
independent.
Q10: Find the dimension of the Left Null Space for a 5×35×3 matrix with rank 3.
• Solution: Dim N(AT)=m−r=5−3=2N(AT)=m−r=5−3=2.
Topic: Eigenvalues & Eigenvectors
Q11: Find eigenvalues of A=(3113)A=(3113).
• Solution: det(A−λI)=(3−λ)2−1=0⇒3−λ=±1det(A−λI)=(3−λ)2−1=0⇒3−λ
=±1.
λ=2,4λ=2,4.
Q12: Find eigenvectors for Q11.
• Solution:
λ=4λ=4: (−111−1)x=0⇒x1=x2⇒v1=(1,1)(−111−1)x=0⇒x1=x2⇒v1=(1,1).
λ=2λ=2: (1111)x=0⇒x1=−x2⇒v2=(1,−1)(1111)x=0⇒x1=−x2⇒v2=(1,−1).
Q13: Find eigenvalues of A2A2 given eigenvalues of AA are λλ.
• Solution: Eigenvalues are λ2λ2.
Q14: Diagonalize A=(1002)A=(1002).
• Solution: Already diagonal. P=I,D=AP=I,D=A.
Q15: Can A=(1101)A=(1011) be diagonalized?
• Solution: No. Eigenvalues are 1, 1. Only one eigenvector (1,0)(1,0). Not enough
for a basis.
Topic: SVD & PCA
Q16: Calculate ATAATA for A=(1101)A=(1011).
• Solution: (1011)(1101)=(1112)(1101)(1011)=(1112).
Q17: If eigenvalues of ATAATA are 9 and 4, what are the singular values of AA?
• Solution: σ=λσ=λ. Singular values are 3 and 2.
Q18: What are the dimensions of U,Σ,VU,Σ,V for a 100×5100×5 matrix?
• Solution: U:100×100U:100×100, Σ:100×5Σ:100×5, V:5×5V:5×5.
Q19: Given data points (1,1),(2,2),(3,3)(1,1),(2,2),(3,3). Find the first Principal
Component direction.
• Solution: Data lies on line y=xy=x. Direction is unit vector 12(1,1)21(1,1).
Q20: Why center data before PCA?
• Solution: PCA measures variance. If not centered, the first component captures the
mean offset rather than the data’s internal spread.
Topic: Positive Definiteness
Q21: Test if A=(2−1−12)A=(2−1−12) is Positive Definite.
• Solution: Determinant = 4−1=3(>0)4−1=3(>0). Trace = 4 (>0). Yes, PD.
Q22: What does Positive Definite mean for energy xTAxxTAx?
• Solution: xTAx>0xTAx>0 for all non-zero vectors xx.
Q23: Why is Positive Definiteness important for the Hessian matrix?
• Solution: If the Hessian is PD at a critical point, that point is a local minimum
(strictly convex).
Q24: Is ATAATA always Positive Definite?
• Solution: No, it is Positive Semi-Definite. It is PD only if columns of A are
independent.
Q25: Which is PD: (1221)(1221) or (3113)(3113)?
• Solution: First one has Det =1−4=−3=1−4=−3 (Not PD). Second has
Det =8=8 (PD). The second one.
PART II: ALGORITHMS & PROGRAMMING
(Modules 3 & 4)
Module 3: Computations with Large Matrices
Syllabus Focus: Norms, Least Squares, QR Decomposition, Iterative Methods (Arnoldi,
Conjugate Gradient). 6
Key Concepts
1. Least Squares: Solving inconsistent systems Ax=bAx=b by
minimizing ∣∣Ax−b∣∣2∣∣Ax−b∣∣2. Solution: ATAx=ATbATAx=ATb.
2. QR Decomposition: A=QRA=QR (QQ is orthogonal, RR is upper triangular). Used
for numerically stable Least Squares.
3. Iterative Methods: For huge matrices where A−1A−1 is impossible.
o Conjugate Gradient (CG): For Symmetric Positive Definite systems.
o Krylov Subspaces: Solvers search within spanb,Ab,A2b...b,Ab,A2b....
Module 4: Linear Programming (LPP)
Syllabus Focus: Simplex Method, Duality, Big-M Method. 7
Key Concepts
1. Standard Form: Maximize cTxcTx subject to Ax=b,x≥0Ax=b,x≥0.
2. Simplex Method: Moves along vertices of the feasible region (polytope) to find the
optimal corner.
3. Duality: Every LPP has a Dual. Dual variables represent shadow prices (value of
resources).
Practice Problem Set: Modules 3 & 4 (25 Questions)
Topic: Norms & QR (Module 3)
Q26: Calculate L1,L2,L∞L1,L2,L∞ norms of x=(3,−4,0)x=(3,−4,0).
• Solution: L1=7L1=7, L2=5L2=5, L∞=4L∞=4.
Q27: Calculate Condition Number κ(A)κ(A) for diag(100, 2).
• Solution: κ=σmax/σmin=100/2=50κ=σmax/σmin=100/2=50.
Q28: Perform Gram-Schmidt on a=(1,1),b=(1,0)a=(1,1),b=(1,0).
• Solution:
q1=a/∣∣a∣∣=(1/2,1/2)q1=a/∣∣a∣∣=(1/2,1/2).
B=b−(b⋅q1)q1=(1,0)−0.5(1,1)=(0.5,−0.5)B=b−(b⋅q1)q1
=(1,0)−0.5(1,1)=(0.5,−0.5).
q2=B/∣∣B∣∣=(1/2,−1/2)q2=B/∣∣B∣∣=(1/2,−1/2).
Q29: Form the QR matrices from Q28.
• Solution: Q=(1/21/21/2−1/2)Q=(1/21/21/2−1/2), R=(21/201/2)R=(201/21/2
).
Q30: Why is QR preferred over Normal Equations?
• Solution: Normal Equations square the condition number
(κ(ATA)=κ(A)2κ(ATA)=κ(A)2), doubling numerical error. QR works
with κ(A)κ(A).
Topic: Least Squares & Iterative Methods (Module 3)
Q31: Set up Normal Equations for A=(101112),b=(600)A=⎝⎛111012⎠⎞,b=⎝⎛
600⎠⎞.
• Solution: (3335)x^=(60)(3335)x^=(60).
Q32: Solve the system in Q31.
• Solution: 3x1+5x2=0⇒x1=−5/3x23x1+5x2=0⇒x1=−5/3x2.
3(−5/3x2)+3x2=6⇒−2x2=6⇒x2=−33(−5/3x2)+3x2=6⇒−2x2=6⇒x2=−3.
x1=5x1=5. x^=(5,−3)x^=(5,−3).
Q33: Calculate projection of bb onto C(A)C(A) using Q32.
• Solution: p=Ax^=(52−1)p=Ax^=⎝⎛52−1⎠⎞.
Q34: What is the computational complexity of Conjugate Gradient vs Gaussian
Elimination?
• Solution: CG is O(m⋅k)O(m⋅k) (where mm is non-zeros, kk iterations). Gaussian
is O(n3)O(n3). CG is far faster for sparse matrices.
Q35: Explain “Randomized Matrix Multiplication”.
• Solution: Approximating ABAB by sampling columns of AA and rows of BB with
probability proportional to their norms.
Q36: What is the Rayleigh Quotient?
• Solution: R(x)=xTAxxTxR(x)=xTxxTAx. Its max value is λmaxλmax.
Q37: Why use L1L1 norm (Lasso) in Regression?
• Solution: It induces sparsity (sets coefficients to exactly zero), performing feature
selection.
Q38: Define a Krylov Subspace.
• Solution: Space spanned by {b,Ab,A2b,…}{b,Ab,A2b,…}.
Topic: Linear Programming (Module 4)
Q39: Formulate:
Maximize 3x+5y3x+5y s.t. x≤4,2y≤12,3x+2y≤18x≤4,2y≤12,3x+2y≤18.
• Solution: Max Z=3x+5yZ=3x+5y s.t. x≤4,y≤6,3x+2y≤18,x,y≥0x≤4,y≤6,3x+2
y≤18,x,y≥0.
Q40: Convert Q39 to Standard Form (Slack variables).
• Solution: Max 3x+5y+0s1+0s2+0s33x+5y+0s1+0s2+0s3.
Constraints: x+s1=4;2y+s2=12;3x+2y+s3=18x+s1=4;2y+s2=12;3x+2y+s3
=18.
Q41: Find the Dual of Q39.
• Solution: Min 4y1+12y2+18y34y1+12y2+18y3.
S.t. y1+3y3≥3y1+3y3≥3; 2y2+2y3≥52y2+2y3≥5.
Q42: Solve Q39 Graphically (Find intersection).
• Solution: Intersection of y=6y=6 and 3x+2y=183x+2y=18.
3x+12=18⇒3x=6⇒x=23x+12=18⇒3x=6⇒x=2. Point (2,6).
Check x≤4x≤4. Valid.
Z=3(2)+5(6)=36Z=3(2)+5(6)=36.
Q43: How do you identify the Pivot Column in Simplex?
• Solution: The column with the most negative coefficient in the objective row (Max
problem).
Q44: How do you identify the Pivot Row?
• Solution: The row with the minimum positive ratio (Solution / Pivot Column value).
Q45: What is a “Shadow Price”?
• Solution: The value of the Dual variable. It equals the increase in ZZ if a constraint
is relaxed by 1 unit.
Q46: Interpret “Complementary Slackness”.
• Solution: If a primal constraint has slack (>0>0), the corresponding dual price is 0.
Q47: What is the Big-M method?
• Solution: Adds artificial variables with large penalties (MM) to
handle ≥≥ constraints in Simplex.
Q48: If Primal is Unbounded, the Dual is…?
• Solution: Infeasible.
Q49: What is a Convex Hull?
• Solution: The set of all convex combinations of a set of points. The feasible region
of LPP is a convex hull.
Q50: Verify vertex (0,0)(0,0) for Max x+yx+y s.t x+y≤1x+y≤1.
• Solution: Z=0Z=0. Neighboring vertex (1,0)(1,0) gives Z=1Z=1. Not optimal.
PART III: CONVEX OPTIMIZATION &
DUALITY (Modules 5 & 6)
Module 5: Convex Optimization
Syllabus Focus: Convex sets/functions, Gradient Descent, Newton’s Method. 8
Key Concepts
1. Convexity: A function is convex if its Hessian is Positive Semidefinite
(∇2f⪰0∇2f⪰0). Local min = Global min.
2. Gradient Descent: xnew=x−t∇f(x)xnew=x−t∇f(x). First-order method.
3. Newton’s Method: xnew=x−H−1∇f(x)xnew=x−H−1∇f(x). Second-order method
(uses curvature). Converges much faster but is computationally heavy.
Module 6: Duality & KKT
Syllabus Focus: Lagrange Multipliers, KKT Conditions, Sensitivity. 9
Key Concepts
1. Lagrange Function: L(x,λ)=f(x)+∑λigi(x)L(x,λ)=f(x)+∑λigi(x).
2. KKT Conditions: Necessary conditions for optimality.
o Stationarity (∇L=0∇L=0).
o Primal/Dual Feasibility.
o Complementary Slackness (λg(x)=0λg(x)=0).
3. Strong Duality: When Primal Min = Dual Max. Holds if Slater’s Condition is met
(strictly feasible point exists).
Practice Problem Set: Modules 5 & 6 (25 Questions)
Topic: Convexity & Algorithms (Module 5)
Q51: Prove Unit Ball ∣∣x∣∣≤1∣∣x∣∣≤1 is convex.
• Solution: Triangle
inequality: ∣∣θx+(1−θ)y∣∣≤θ∣∣x∣∣+(1−θ)∣∣y∣∣≤θ+(1−θ)=1∣∣θx+(1−θ)y∣∣≤θ∣∣x∣∣
+(1−θ)∣∣y∣∣≤θ+(1−θ)=1.
Q52: Is f(x)=x3f(x)=x3 convex?
• Solution: f′′(x)=6xf′′(x)=6x. Negative when x<0x<0. No.
Q53: Check convexity of f(x,y)=x2+y2+2xyf(x,y)=x2+y2+2xy.
• Solution: Hessian has eigenvalues 4, 0. Both ≥0≥0. Yes, Convex.
Q54: Perform one Gradient Descent step for f(x)=x2−4xf(x)=x2−4x at x0=0,t=0.5x0
=0,t=0.5.
• Solution: ∇f=2x−4∇f=2x−4. ∇f(0)=−4∇f(0)=−4.
x1=0−0.5(−4)=2x1=0−0.5(−4)=2.
Q55: Calculate Newton Step for ex+xex+x at x=0x=0.
• Solution: f′=2,f′′=1f′=2,f′′=1. Step Δx=−2/1=−2Δx=−2/1=−2. xnew=−2xnew
=−2.
Q56: Why does Newton’s method converge faster?
• Solution: It approximates the function as a quadratic (parabola) rather than a plane,
adjusting for curvature.
Q57: Explain Backtracking Line Search.
• Solution: Dynamically reducing step size tt until the decrease in f(x)f(x) is
sufficient (Armijo rule).
Q58: What is a Posynomial (Geometric Programming)?
• Solution: A sum of monomials cx1a1...xnancx1a1...xnan with c>0c>0.
Q59: Condition Number problem in Gradient Descent?
• Solution: High condition numbers cause “zig-zagging” in narrow valleys, slowing
convergence.
Q60: Epigraph definition.
• Solution: The set of points lying above the graph of ff. ff is convex iff Epigraph is
convex.
Q61: Steepest Descent for L1L1 norm.
• Solution: Updates the single coordinate with the largest partial derivative
(Coordinate Descent).
Q62: Is log(ex+ey)log(ex+ey) convex?
• Solution: Yes (Log-Sum-Exp function is convex).
Q63: Geometric interpretation of Gradient.
• Solution: Vector pointing in the direction of steepest ascent.
Topic: Duality & KKT (Module 6)
Q64: Find Dual Function for Min x2x2 s.t. x≤−2x≤−2.
• Solution: g(λ)=−λ2/4+2λg(λ)=−λ2/4+2λ.
Q65: Solve Q64 for optimal λλ.
• Solution: Maximize g(λ)g(λ). Derivative −λ/2+2=0⇒λ=4−λ/2+2=0⇒λ=4.
Q66: Duality Gap definition.
• Solution: Primal Optimal - Dual Optimal. Zero for convex problems (Strong Duality).
Q67: Slater’s Condition.
• Solution: If a strictly feasible point exists (g(x)<0g(x)<0), Strong Duality holds.
Q68: Check KKT for Min xx s.t. x≥1x≥1 at x=0x=0.
• Solution: Primal feasibility fails (00 is not ≥1≥1). KKT not satisfied.
Q69: Write KKT Stationarity for Min cTxcTx s.t Ax=bAx=b.
• Solution: c+ATν=0c+ATν=0.
Q70: Why is λλ called “Price”?
• Solution: λ=∂p∗/∂bλ=∂p∗/∂b. Rate of change of cost with respect to constraint.
Q71: Sensitivity Analysis: If λ=5λ=5, what if constraint increases by 1?
• Solution: Cost improves (decreases) by approx 5.
Q72: Saddle Point property.
• Solution: Solution is Min over xx, Max over λλ.
Q73: Necessary vs Sufficient KKT.
• Solution: Necessary for all local optima. Sufficient only for Convex problems.
Q74: Dual of SVM.
• Solution: Maximizes margin using kernel products xiTxjxiTxj.
Q75: If λi>0λi>0 at optimum, what do we know about the constraint?
• Solution: The constraint is active (equality holds, gi(x)=0gi(x)=0).