ii Contents
4.2 Eigenvalues and Eigenvectors 105
4.3 Cholesky Decomposition 114
4.4 Eigendecomposition and Diagonalization 115
4.5 Singular Value Decomposition 119
4.6 Matrix Approximation 129
4.7 Matrix Phylogeny 134
4.8 Further Reading 135
Exercises 137
5 Vector Calculus 139
5.1 Differentiation of Univariate Functions 141
5.2 Partial Differentiation and Gradients 146
5.3 Gradients of Vector-Valued Functions 149
5.4 Gradients of Matrices 155
5.5 Useful Identities for Computing Gradients 158
5.6 Backpropagation and Automatic Differentiation 159
5.7 Higher-Order Derivatives 164
5.8 Linearization and Multivariate Taylor Series 165
5.9 Further Reading 170
Exercises 170
6 Probability and Distributions 172
6.1 Construction of a Probability Space 172
6.2 Discrete and Continuous Probabilities 178
6.3 Sum Rule, Product Rule, and Bayes’ Theorem 183
6.4 Summary Statistics and Independence 186
6.5 Gaussian Distribution 197
6.6 Conjugacy and the Exponential Family 205
6.7 Change of Variables/Inverse Transform 214
6.8 Further Reading 221
Exercises 222
7 Continuous Optimization 225
7.1 Optimization Using Gradient Descent 227
7.2 Constrained Optimization and Lagrange Multipliers 233
7.3 Convex Optimization 236
7.4 Further Reading 246
Exercises 247
Part II Central Machine Learning Problems 249
8 When Models Meet Data 251
8.1 Data, Models, and Learning 251
8.2 Empirical Risk Minimization 258
8.3 Parameter Estimation 265
8.4 Probabilistic Modeling and Inference 272
8.5 Directed Graphical Models 278
Draft (2021-07-29) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com.