CS-567 Machine Learning FALL – 2015 CS-567 – Machine Learning Instructor: Dr. Nazar Khan Semester: Fall 2015 Campus: Allama Iqbal Course Description: The ability of biological brains to sense, perceive, analyse and recognise patterns can only be described as stunning. Furthermore, they have the ability to learn from new examples. Mankind's understanding of how biological brains operate exactly is embarrassingly limited. However, there do exist numerous 'practical' techniques that give machines the 'appearance' of being intelligent. This is the domain of statistical pattern recognition and machine learning. Instead of attempting to mimic the complex workings of a biological brain, this course aims at explaining mathematically well-founded techniques for analysing patterns and learning from them. Accordingly, this course is a mathematically involved introduction into the field of pattern recognition and machine learning. It will prepare students for further study/research in the areas of Pattern Recognition, Machine Learning, Computer Vision, Data Analysis and other areas attempting to solve Artificial Intelligence (AI) type problems. Goals and Objectives: This course will prepare students for further study/research in the areas of Pattern Recognition, Machine Learning, Computer Vision, Data Analysis and other areas attempting to solve Artificial Intelligence (AI) type problems. Text: Pattern Recognition and Machine Learning by Christopher M. Bishop (2006) Prerequisites: The course is designed to be self-contained. So the required mathematical details will be covered in the lectures. However, this is a math-heavy course. Students are encouraged to brush up on their knowledge of 1. calculus (differentiation, partial derivatives) 2. linear algebra (vectors, matrices, orthogonality, eigenvectors, SVD) 3. probability and statistics The students should know that the only way to benefit from this course is to be prepared to spend lots of hours reading the text book and attempting its exercises (preferably) alone or with a class-fellow. CS-567 Machine Learning Scheme of Study: Outline 1. Introduction to Pattern Recognition and Machine Learning 2. Mathematical Background 3. Decision Theory 4. Information Theory 5. Probability Distributions 6. Density Estimation Methods 7. Linear Models for Regression 8. Linear Models for Classification 9. Neural Networks 10. Latent Variable Models 11. Dimensionality Reduction Detail 1. Introduction (1.5 Weeks) a. Overview of Machine Learning b. Curve fitting (Over-fitting vs. Generalization) c. Regularized curve fitting 2. Mathematical Background (2.5 Weeks) a. Probability b. Gaussian distribution c. Fitting a Gaussian distribution to data d. Probabilistic curve fitting (Maximum Likelihood Estimation) e. Bayesian curve fitting (Maximum Posterior Estimation) f. Model selection (Cross Validation) g. Calculus of variations h. Constrained optimisation via Lagrange multipliers 3. Decision Theory (1 Week) a. Minimising number of misclassifications b. Minimising expected loss c. Benefits of knowing posterior distributions d. Generative. vs Discriminative vs. Discriminant functions e. Loss functions for regression problems 4. Information Theory (1 Week) a. Information ∝ 1/Probability b. Entropy = expected information (measure of uncertainty) i. Maximum Entropy Discrete Distribution (Uniform) FALL – 2015 CS-567 Machine Learning 5. 6. 7. 8. 9. ii. Maximum Entropy Continuous Distribution (Gaussian) c. Jensen's Inequality d. Relative Entropy (KL divergence) e. Mutual Information Probability Distributions and Density Estimation (2 Weeks) a. Density Estimation is fundamentally ill-posed b. Probability Distributions i. Bernoulli ii. Binomial iii. Beta iv. Multinomial v. Dirichlet vi. Gaussian c. Completing-the-square d. Sequential Learning via Conjugate Priors e. Density Estimation Methods Linear Models for Regression (1.5 Weeks) a. Least-squares estimation b. Design matrix c. Pseudoinverse d. Regularized least-squares estimation e. Linear regression for multivariate targets Linear Models for Classification (1.5 Weeks) a. Discriminant Functions i. Least-squares ii. Fisher's Linear Discriminant (FLD) iii. Perceptron b. Probabilistic Generative Models c. Probabilistic Discriminative Models Neural Networks (3 Weeks) a. Back-propagation b. Regularization Techniques i. Early stopping ii. Weight decay iii. Training with transformed data c. Convolutional Neural Networks Latent Variable Models (1.5 Weeks) a. K-means Clustering -- alternating optimization b. Gaussian Mixture Models FALL – 2015 CS-567 Machine Learning FALL – 2015 c. Expectation Maximisation (EM) Algorithm 10. Dimensionality Reduction (0.5 Week) a. Principle Component Analysis Grading Scheme/Criteria: Assignments Quizzes Mid Final 20% 5% 35% 40% Grading Policy: 1. Theoretical assignments have to be submitted before the lecture on the due date. 2. There will be no make-up for any missed quiz. 3. Make-up for a mid-term or final exam will be allowed only under exceptional circumstances provided that the instructor has been notified beforehand. 4. Instructor reserves the right to deny requests for any make-up quiz or exam. 5. Worst score on quizzes will be dropped. 6. Worst score on assignments will be dropped.