6 Foreword Table of Symbols Symbol Typical meaning a, b, c, α, β, γ x, y, z A, B, C x> , A> A−1 hx, yi x> y B = (b1 , b2 , b3 ) B = [b1 , b2 , b3 ] B = {b1 , b2 , b3 } Z, N R, C Rn ∀x ∃x a := b a =: b a∝b g◦f ⇐⇒ =⇒ A, C a∈A ∅ A\B D N Im 0m,n 1m,n ei dim rk(A) Im(Φ) ker(Φ) span[b1 ] tr(A) det(A) |·| k·k λ Eλ Scalars are lowercase Vectors are bold lowercase Matrices are bold uppercase Transpose of a vector or matrix Inverse of a matrix Inner product of x and y Dot product of x and y (Ordered) tuple Matrix of column vectors stacked horizontally Set of vectors (unordered) Integers and natural numbers, respectively Real and complex numbers, respectively n-dimensional vector space of real numbers Universal quantifier: for all x Existential quantifier: there exists x a is defined as b b is defined as a a is proportional to b, i.e., a = constant · b Function composition: “g after f ” If and only if Implies Sets a is an element of set A Empty set A without B : the set of elements in A but not in B Number of dimensions; indexed by d = 1, . . . , D Number of data points; indexed by n = 1, . . . , N Identity matrix of size m × m Matrix of zeros of size m × n Matrix of ones of size m × n Standard/canonical vector (where i is the component that is 1) Dimensionality of vector space Rank of matrix A Image of linear mapping Φ Kernel (null space) of a linear mapping Φ Span (generating set) of b1 Trace of A Determinant of A Absolute value or determinant (depending on context) Norm; Euclidean, unless specified Eigenvalue or Lagrange multiplier Eigenspace corresponding to eigenvalue λ Draft (2022-01-11) of “Mathematics for Machine Learning”. Feedback: https://mml-book.com. 7 Foreword Symbol Typical meaning x⊥y V V⊥ PN x QNn=1 n n=1 xn θ Vectors x and y are orthogonal Vector space Orthogonal complement of vector space V Sum of the xn : x1 + . . . + xN Product of the xn : x1 · . . . · xN Parameter vector Partial derivative of f with respect to x Total derivative of f with respect to x Gradient The smallest function value of f The value x∗ that minimizes f (note: arg min returns a set of values) Lagrangian Negative log-likelihood Binomial coefficient, n choose k Variance of x with respect to the random variable X Expectation of x with respect to the random variable X Covariance between x and y . X is conditionally independent of Y given Z Random variable X is distributed according to p Gaussian distribution with mean µ and covariance Σ Bernoulli distribution with parameter µ Binomial distribution with parameters N, µ Beta distribution with parameters α, β ∂f ∂x df dx ∇ f∗ = minx f (x) x∗ ∈ arg minx f (x) L L n k VX [x] EX [x] CovX,Y [x, y] X⊥ ⊥ Y |Z X∼p N µ, Σ Ber(µ) Bin(N, µ) Beta(α, β) Table of Abbreviations and Acronyms Acronym Meaning e.g. GMM i.e. i.i.d. MAP MLE ONB PCA PPCA REF SPD SVM Exempli gratia (Latin: for example) Gaussian mixture model Id est (Latin: this means) Independent, identically distributed Maximum a posteriori Maximum likelihood estimation/estimator Orthonormal basis Principal component analysis Probabilistic principal component analysis Row-echelon form Symmetric, positive definite Support vector machine ©2021 M. P. Deisenroth, A. A. Faisal, C. S. Ong. Published by Cambridge University Press (2020).