0% found this document useful (0 votes)
57 views16 pages

Linear Algebra Study Guide for Data Science

This comprehensive study guide for MAT4101 covers Linear Algebra and Optimization for Data Science, targeting MSc Data Science students. It includes core modules on matrix foundations, eigenvalues, SVD, PCA, algorithms for large matrices, and linear programming, alongside practice problems for each module. Key concepts such as LU decomposition, QR decomposition, and the Simplex method are emphasized to enhance understanding and application in data science contexts.

Uploaded by

pushpakam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views16 pages

Linear Algebra Study Guide for Data Science

This comprehensive study guide for MAT4101 covers Linear Algebra and Optimization for Data Science, targeting MSc Data Science students. It includes core modules on matrix foundations, eigenvalues, SVD, PCA, algorithms for large matrices, and linear programming, alongside practice problems for each module. Key concepts such as LU decomposition, QR decomposition, and the Simplex method are emphasized to enhance understanding and application in data science contexts.

Uploaded by

pushpakam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

COMPREHENSIVE STUDY GUIDE:

MAT4101
Linear Algebra and Optimization for Data Science

Target Audience: MSc Data Science

Reference Texts:

• Linear Algebra and Learning from Data, Gilbert Strang (2019) 1

• Convex Optimization, Boyd & Vandenberghe (2013) 2

• Mathematics for Machine Learning, Deisenroth et al. (2019) 3

PART I: LINEAR ALGEBRA CORE (Modules


1 & 2)
Module 1: Matrix Foundations & Subspaces

Syllabus Focus: Matrix multiplication, LU/Cholesky Decompositions, Rank, Vector Spaces,


The Four Fundamental Subspaces. 4

Key Concepts

1. The Four Fundamental Subspaces:

For any m×nm×n matrix AA with Rank rr:


o Column Space (C(A)C(A)): Dimension rr. The space of all possible
outputs bb in Ax=bAx=b.

o Row Space (C(AT)C(AT)): Dimension rr. The space spanned by the rows.

o Null Space (N(A)N(A)): Dimension n−rn−r. Solutions to Ax=0Ax=0.

o Left Null Space (N(AT)N(AT)): Dimension m−rm−r.

o Orthogonality: N(A)⊥C(AT)N(A)⊥C(AT).

2. LU Decomposition:

Factoring A=LUA=LU (Lower ×× Upper). Used to solve systems efficiently


without re-computing inverses.

3. Cholesky Decomposition:

If AA is Symmetric Positive Definite, A=LLTA=LLT. Crucial for efficient


optimization algorithms.

Module 2: Eigenvalues, SVD & PCA

Syllabus Focus: Eigen Decomposition, SVD, Principal Component Analysis (PCA), Positive
Definite Matrices. 5

Key Concepts

1. Eigenvalues (λ): Roots of det(A−λI)=0. They determine the stability and growth of
a system.

2. Singular Value Decomposition (SVD):

The “Fundamental Theorem of Data Science.” Every matrix AA decomposes into:

A=UΣVTA=UΣVT

o UU: Left Singular Vectors (Eigenvectors of AATAAT).


o ΣΣ: Diagonal Singular Values (Square roots of eigenvalues of ATAATA).

o VV: Right Singular Vectors (Eigenvectors of ATAATA).


3. PCA (Principal Component Analysis):

PCA is simply SVD applied to the centered covariance matrix. The columns
of VV (Right Singular Vectors) are the Principal Components.

Practice Problem Set: Modules 1 & 2 (25 Questions)

Topic: Decompositions (LU, Cholesky)

Q1: Perform LU Decomposition on A=(2168)A=(2618).

• Solution:

1. Pivot is 2. Multiplier l21=6/2=3l21=6/2=3.

2. Row Operation: R2←R2−3R1⇒(2105)=UR2←R2−3R1⇒(2015)=U.

3. LL stores the multiplier: L=(1031)L=(1301).

4. Result: L=(1031),U=(2105)L=(1301),U=(2015).

Q2: Solve Ax=bAx=b for the matrix in Q1 with b=(525)b=(525) using forward/back
substitution.

• Solution:

1. Solve Lc=bLc=b: (1031)c=(525)⇒c1=5,3(5)+c2=25⇒c2=10(1301


)c=(525)⇒c1=5,3(5)+c2=25⇒c2=10.

2. Solve Ux=cUx=c: (2105)x=(510)⇒5x2=10→x2=2(2015)x=(510


)⇒5x2=10→x2=2.
3. 2x1+2=5⇒2x1=3⇒x1=1.52x1+2=5⇒2x1=3⇒x1=1.5.

Answer: x=(1.5,2)x=(1.5,2).

Q3: Find the Cholesky Decomposition of A=(4222)A=(4222).

• Solution: Target A=LLTA=LLT.

l11=4=2l11=4=2.

l21=2/2=1l21=2/2=1.

l22=a22−l212=2−12=1l22=a22−l212=2−12=1.

Result: L=(2011)L=(2101).

Q4: Why must pivots be positive for Cholesky decomposition?

• Solution: Cholesky involves square roots of pivots. If a pivot is negative, the result
is imaginary, implying the matrix is not Positive Definite.

Q5: Explain the Permutation Matrix PP in PA=LUPA=LU.

• Solution: If a zero appears in a pivot position, we must swap rows. PP records these
row swaps.

Topic: Subspaces & Rank

Q6: Describe the Column Space C(A)C(A) of A=(100100)A=⎝⎛100010⎠⎞.

• Solution: It is the plane spanned by (1,0,0)(1,0,0) and (0,1,0)(0,1,0), which is


the xyxy-plane in R3R3.

Q7: Find a basis for the subspace x+y+z=0x+y+z=0.

• Solution: x=−y−zx=−y−z. Free variables y,zy,z.

Basis vectors: v1=(−1,1,0)v1=(−1,1,0) and v2=(−1,0,1)v2=(−1,0,1).


Q8: Calculate the Rank of A=(111111111)A=⎝⎛111111111⎠⎞.

• Solution: All rows are identical. Elimination leaves one non-zero row. Rank = 1.

Q9: If Rank(AA) = nn (number of columns), what is the Null Space?

• Solution: The Null Space contains only the zero vector {0}{0}. Columns are
independent.

Q10: Find the dimension of the Left Null Space for a 5×35×3 matrix with rank 3.

• Solution: Dim N(AT)=m−r=5−3=2N(AT)=m−r=5−3=2.

Topic: Eigenvalues & Eigenvectors

Q11: Find eigenvalues of A=(3113)A=(3113).

• Solution: det⁡(A−λI)=(3−λ)2−1=0⇒3−λ=±1det(A−λI)=(3−λ)2−1=0⇒3−λ
=±1.

λ=2,4λ=2,4.
Q12: Find eigenvectors for Q11.

• Solution:

λ=4λ=4: (−111−1)x=0⇒x1=x2⇒v1=(1,1)(−111−1)x=0⇒x1=x2⇒v1=(1,1).

λ=2λ=2: (1111)x=0⇒x1=−x2⇒v2=(1,−1)(1111)x=0⇒x1=−x2⇒v2=(1,−1).

Q13: Find eigenvalues of A2A2 given eigenvalues of AA are λλ.

• Solution: Eigenvalues are λ2λ2.

Q14: Diagonalize A=(1002)A=(1002).

• Solution: Already diagonal. P=I,D=AP=I,D=A.

Q15: Can A=(1101)A=(1011) be diagonalized?


• Solution: No. Eigenvalues are 1, 1. Only one eigenvector (1,0)(1,0). Not enough
for a basis.

Topic: SVD & PCA

Q16: Calculate ATAATA for A=(1101)A=(1011).

• Solution: (1011)(1101)=(1112)(1101)(1011)=(1112).

Q17: If eigenvalues of ATAATA are 9 and 4, what are the singular values of AA?

• Solution: σ=λσ=λ. Singular values are 3 and 2.

Q18: What are the dimensions of U,Σ,VU,Σ,V for a 100×5100×5 matrix?

• Solution: U:100×100U:100×100, Σ:100×5Σ:100×5, V:5×5V:5×5.

Q19: Given data points (1,1),(2,2),(3,3)(1,1),(2,2),(3,3). Find the first Principal


Component direction.

• Solution: Data lies on line y=xy=x. Direction is unit vector 12(1,1)21(1,1).

Q20: Why center data before PCA?

• Solution: PCA measures variance. If not centered, the first component captures the
mean offset rather than the data’s internal spread.

Topic: Positive Definiteness

Q21: Test if A=(2−1−12)A=(2−1−12) is Positive Definite.

• Solution: Determinant = 4−1=3(>0)4−1=3(>0). Trace = 4 (>0). Yes, PD.

Q22: What does Positive Definite mean for energy xTAxxTAx?

• Solution: xTAx>0xTAx>0 for all non-zero vectors xx.

Q23: Why is Positive Definiteness important for the Hessian matrix?


• Solution: If the Hessian is PD at a critical point, that point is a local minimum
(strictly convex).

Q24: Is ATAATA always Positive Definite?

• Solution: No, it is Positive Semi-Definite. It is PD only if columns of A are


independent.

Q25: Which is PD: (1221)(1221) or (3113)(3113)?

• Solution: First one has Det =1−4=−3=1−4=−3 (Not PD). Second has
Det =8=8 (PD). The second one.

PART II: ALGORITHMS & PROGRAMMING


(Modules 3 & 4)
Module 3: Computations with Large Matrices

Syllabus Focus: Norms, Least Squares, QR Decomposition, Iterative Methods (Arnoldi,


Conjugate Gradient). 6

Key Concepts

1. Least Squares: Solving inconsistent systems Ax=bAx=b by


minimizing ∣∣Ax−b∣∣2∣∣Ax−b∣∣2. Solution: ATAx=ATbATAx=ATb.

2. QR Decomposition: A=QRA=QR (QQ is orthogonal, RR is upper triangular). Used


for numerically stable Least Squares.

3. Iterative Methods: For huge matrices where A−1A−1 is impossible.

o Conjugate Gradient (CG): For Symmetric Positive Definite systems.

o Krylov Subspaces: Solvers search within spanb,Ab,A2b...b,Ab,A2b....


Module 4: Linear Programming (LPP)

Syllabus Focus: Simplex Method, Duality, Big-M Method. 7

Key Concepts

1. Standard Form: Maximize cTxcTx subject to Ax=b,x≥0Ax=b,x≥0.

2. Simplex Method: Moves along vertices of the feasible region (polytope) to find the
optimal corner.

3. Duality: Every LPP has a Dual. Dual variables represent shadow prices (value of
resources).

Practice Problem Set: Modules 3 & 4 (25 Questions)

Topic: Norms & QR (Module 3)

Q26: Calculate L1,L2,L∞L1,L2,L∞ norms of x=(3,−4,0)x=(3,−4,0).

• Solution: L1=7L1=7, L2=5L2=5, L∞=4L∞=4.

Q27: Calculate Condition Number κ(A)κ(A) for diag(100, 2).

• Solution: κ=σmax/σmin=100/2=50κ=σmax/σmin=100/2=50.

Q28: Perform Gram-Schmidt on a=(1,1),b=(1,0)a=(1,1),b=(1,0).

• Solution:

q1=a/∣∣a∣∣=(1/2,1/2)q1=a/∣∣a∣∣=(1/2,1/2).

B=b−(b⋅q1)q1=(1,0)−0.5(1,1)=(0.5,−0.5)B=b−(b⋅q1)q1
=(1,0)−0.5(1,1)=(0.5,−0.5).
q2=B/∣∣B∣∣=(1/2,−1/2)q2=B/∣∣B∣∣=(1/2,−1/2).
Q29: Form the QR matrices from Q28.

• Solution: Q=(1/21/21/2−1/2)Q=(1/21/21/2−1/2), R=(21/201/2)R=(201/21/2


).
Q30: Why is QR preferred over Normal Equations?

• Solution: Normal Equations square the condition number


(κ(ATA)=κ(A)2κ(ATA)=κ(A)2), doubling numerical error. QR works
with κ(A)κ(A).

Topic: Least Squares & Iterative Methods (Module 3)

Q31: Set up Normal Equations for A=(101112),b=(600)A=⎝⎛111012⎠⎞,b=⎝⎛


600⎠⎞.

• Solution: (3335)x^=(60)(3335)x^=(60).

Q32: Solve the system in Q31.

• Solution: 3x1+5x2=0⇒x1=−5/3x23x1+5x2=0⇒x1=−5/3x2.

3(−5/3x2)+3x2=6⇒−2x2=6⇒x2=−33(−5/3x2)+3x2=6⇒−2x2=6⇒x2=−3.

x1=5x1=5. x^=(5,−3)x^=(5,−3).

Q33: Calculate projection of bb onto C(A)C(A) using Q32.

• Solution: p=Ax^=(52−1)p=Ax^=⎝⎛52−1⎠⎞.

Q34: What is the computational complexity of Conjugate Gradient vs Gaussian


Elimination?

• Solution: CG is O(m⋅k)O(m⋅k) (where mm is non-zeros, kk iterations). Gaussian


is O(n3)O(n3). CG is far faster for sparse matrices.

Q35: Explain “Randomized Matrix Multiplication”.


• Solution: Approximating ABAB by sampling columns of AA and rows of BB with
probability proportional to their norms.

Q36: What is the Rayleigh Quotient?

• Solution: R(x)=xTAxxTxR(x)=xTxxTAx. Its max value is λmaxλmax.

Q37: Why use L1L1 norm (Lasso) in Regression?

• Solution: It induces sparsity (sets coefficients to exactly zero), performing feature


selection.

Q38: Define a Krylov Subspace.

• Solution: Space spanned by {b,Ab,A2b,…}{b,Ab,A2b,…}.

Topic: Linear Programming (Module 4)

Q39: Formulate:
Maximize 3x+5y3x+5y s.t. x≤4,2y≤12,3x+2y≤18x≤4,2y≤12,3x+2y≤18.

• Solution: Max Z=3x+5yZ=3x+5y s.t. x≤4,y≤6,3x+2y≤18,x,y≥0x≤4,y≤6,3x+2


y≤18,x,y≥0.
Q40: Convert Q39 to Standard Form (Slack variables).

• Solution: Max 3x+5y+0s1+0s2+0s33x+5y+0s1+0s2+0s3.

Constraints: x+s1=4;2y+s2=12;3x+2y+s3=18x+s1=4;2y+s2=12;3x+2y+s3
=18.
Q41: Find the Dual of Q39.

• Solution: Min 4y1+12y2+18y34y1+12y2+18y3.

S.t. y1+3y3≥3y1+3y3≥3; 2y2+2y3≥52y2+2y3≥5.

Q42: Solve Q39 Graphically (Find intersection).

• Solution: Intersection of y=6y=6 and 3x+2y=183x+2y=18.


3x+12=18⇒3x=6⇒x=23x+12=18⇒3x=6⇒x=2. Point (2,6).

Check x≤4x≤4. Valid.

Z=3(2)+5(6)=36Z=3(2)+5(6)=36.
Q43: How do you identify the Pivot Column in Simplex?

• Solution: The column with the most negative coefficient in the objective row (Max
problem).

Q44: How do you identify the Pivot Row?

• Solution: The row with the minimum positive ratio (Solution / Pivot Column value).

Q45: What is a “Shadow Price”?

• Solution: The value of the Dual variable. It equals the increase in ZZ if a constraint
is relaxed by 1 unit.

Q46: Interpret “Complementary Slackness”.

• Solution: If a primal constraint has slack (>0>0), the corresponding dual price is 0.

Q47: What is the Big-M method?

• Solution: Adds artificial variables with large penalties (MM) to


handle ≥≥ constraints in Simplex.

Q48: If Primal is Unbounded, the Dual is…?

• Solution: Infeasible.

Q49: What is a Convex Hull?

• Solution: The set of all convex combinations of a set of points. The feasible region
of LPP is a convex hull.

Q50: Verify vertex (0,0)(0,0) for Max x+yx+y s.t x+y≤1x+y≤1.

• Solution: Z=0Z=0. Neighboring vertex (1,0)(1,0) gives Z=1Z=1. Not optimal.


PART III: CONVEX OPTIMIZATION &
DUALITY (Modules 5 & 6)
Module 5: Convex Optimization

Syllabus Focus: Convex sets/functions, Gradient Descent, Newton’s Method. 8

Key Concepts

1. Convexity: A function is convex if its Hessian is Positive Semidefinite


(∇2f⪰0∇2f⪰0). Local min = Global min.

2. Gradient Descent: xnew=x−t∇f(x)xnew=x−t∇f(x). First-order method.

3. Newton’s Method: xnew=x−H−1∇f(x)xnew=x−H−1∇f(x). Second-order method


(uses curvature). Converges much faster but is computationally heavy.

Module 6: Duality & KKT

Syllabus Focus: Lagrange Multipliers, KKT Conditions, Sensitivity. 9

Key Concepts

1. Lagrange Function: L(x,λ)=f(x)+∑λigi(x)L(x,λ)=f(x)+∑λigi(x).

2. KKT Conditions: Necessary conditions for optimality.

o Stationarity (∇L=0∇L=0).

o Primal/Dual Feasibility.

o Complementary Slackness (λg(x)=0λg(x)=0).


3. Strong Duality: When Primal Min = Dual Max. Holds if Slater’s Condition is met
(strictly feasible point exists).

Practice Problem Set: Modules 5 & 6 (25 Questions)

Topic: Convexity & Algorithms (Module 5)

Q51: Prove Unit Ball ∣∣x∣∣≤1∣∣x∣∣≤1 is convex.

• Solution: Triangle
inequality: ∣∣θx+(1−θ)y∣∣≤θ∣∣x∣∣+(1−θ)∣∣y∣∣≤θ+(1−θ)=1∣∣θx+(1−θ)y∣∣≤θ∣∣x∣∣
+(1−θ)∣∣y∣∣≤θ+(1−θ)=1.

Q52: Is f(x)=x3f(x)=x3 convex?

• Solution: f′′(x)=6xf′′(x)=6x. Negative when x<0x<0. No.

Q53: Check convexity of f(x,y)=x2+y2+2xyf(x,y)=x2+y2+2xy.

• Solution: Hessian has eigenvalues 4, 0. Both ≥0≥0. Yes, Convex.

Q54: Perform one Gradient Descent step for f(x)=x2−4xf(x)=x2−4x at x0=0,t=0.5x0


=0,t=0.5.

• Solution: ∇f=2x−4∇f=2x−4. ∇f(0)=−4∇f(0)=−4.

x1=0−0.5(−4)=2x1=0−0.5(−4)=2.

Q55: Calculate Newton Step for ex+xex+x at x=0x=0.

• Solution: f′=2,f′′=1f′=2,f′′=1. Step Δx=−2/1=−2Δx=−2/1=−2. xnew=−2xnew


=−2.
Q56: Why does Newton’s method converge faster?
• Solution: It approximates the function as a quadratic (parabola) rather than a plane,
adjusting for curvature.

Q57: Explain Backtracking Line Search.

• Solution: Dynamically reducing step size tt until the decrease in f(x)f(x) is


sufficient (Armijo rule).

Q58: What is a Posynomial (Geometric Programming)?

• Solution: A sum of monomials cx1a1...xnancx1a1...xnan with c>0c>0.

Q59: Condition Number problem in Gradient Descent?

• Solution: High condition numbers cause “zig-zagging” in narrow valleys, slowing


convergence.

Q60: Epigraph definition.

• Solution: The set of points lying above the graph of ff. ff is convex iff Epigraph is
convex.

Q61: Steepest Descent for L1L1 norm.

• Solution: Updates the single coordinate with the largest partial derivative
(Coordinate Descent).

Q62: Is log⁡(ex+ey)log(ex+ey) convex?

• Solution: Yes (Log-Sum-Exp function is convex).

Q63: Geometric interpretation of Gradient.

• Solution: Vector pointing in the direction of steepest ascent.

Topic: Duality & KKT (Module 6)

Q64: Find Dual Function for Min x2x2 s.t. x≤−2x≤−2.

• Solution: g(λ)=−λ2/4+2λg(λ)=−λ2/4+2λ.
Q65: Solve Q64 for optimal λλ.

• Solution: Maximize g(λ)g(λ). Derivative −λ/2+2=0⇒λ=4−λ/2+2=0⇒λ=4.

Q66: Duality Gap definition.

• Solution: Primal Optimal - Dual Optimal. Zero for convex problems (Strong Duality).

Q67: Slater’s Condition.

• Solution: If a strictly feasible point exists (g(x)<0g(x)<0), Strong Duality holds.

Q68: Check KKT for Min xx s.t. x≥1x≥1 at x=0x=0.

• Solution: Primal feasibility fails (00 is not ≥1≥1). KKT not satisfied.

Q69: Write KKT Stationarity for Min cTxcTx s.t Ax=bAx=b.

• Solution: c+ATν=0c+ATν=0.

Q70: Why is λλ called “Price”?

• Solution: λ=∂p∗/∂bλ=∂p∗/∂b. Rate of change of cost with respect to constraint.

Q71: Sensitivity Analysis: If λ=5λ=5, what if constraint increases by 1?

• Solution: Cost improves (decreases) by approx 5.

Q72: Saddle Point property.

• Solution: Solution is Min over xx, Max over λλ.

Q73: Necessary vs Sufficient KKT.

• Solution: Necessary for all local optima. Sufficient only for Convex problems.

Q74: Dual of SVM.

• Solution: Maximizes margin using kernel products xiTxjxiTxj.

Q75: If λi>0λi>0 at optimum, what do we know about the constraint?


• Solution: The constraint is active (equality holds, gi(x)=0gi(x)=0).

You might also like