Course Outline

advertisement
CS-567 Machine Learning
FALL – 2015
CS-567 – Machine Learning
Instructor: Dr. Nazar Khan
Semester: Fall 2015
Campus: Allama Iqbal
Course Description:
The ability of biological brains to sense, perceive, analyse and recognise patterns can only be
described as stunning. Furthermore, they have the ability to learn from new examples. Mankind's
understanding of how biological brains operate exactly is embarrassingly limited.
However, there do exist numerous 'practical' techniques that give machines the 'appearance' of
being intelligent. This is the domain of statistical pattern recognition and machine learning.
Instead of attempting to mimic the complex workings of a biological brain, this course aims at
explaining mathematically well-founded techniques for analysing patterns and learning from
them.
Accordingly, this course is a mathematically involved introduction into the field of pattern
recognition and machine learning. It will prepare students for further study/research in the areas
of Pattern Recognition, Machine Learning, Computer Vision, Data Analysis and other areas
attempting to solve Artificial Intelligence (AI) type problems.
Goals and Objectives:
This course will prepare students for further study/research in the areas of Pattern Recognition,
Machine Learning, Computer Vision, Data Analysis and other areas attempting to solve
Artificial Intelligence (AI) type problems.
Text:
Pattern Recognition and Machine Learning by Christopher M. Bishop (2006)
Prerequisites:
The course is designed to be self-contained. So the required mathematical details will be
covered in the lectures. However, this is a math-heavy course. Students are encouraged to
brush up on their knowledge of
1. calculus (differentiation, partial derivatives)
2. linear algebra (vectors, matrices, orthogonality, eigenvectors, SVD)
3. probability and statistics
The students should know that the only way to benefit from this course is to be prepared to
spend lots of hours reading the text book and attempting its exercises (preferably) alone or
with a class-fellow.
CS-567 Machine Learning
Scheme of Study:
Outline
1. Introduction to Pattern Recognition and Machine Learning
2. Mathematical Background
3. Decision Theory
4. Information Theory
5. Probability Distributions
6. Density Estimation Methods
7. Linear Models for Regression
8. Linear Models for Classification
9. Neural Networks
10. Latent Variable Models
11. Dimensionality Reduction
Detail
1. Introduction (1.5 Weeks)
a. Overview of Machine Learning
b. Curve fitting (Over-fitting vs. Generalization)
c. Regularized curve fitting
2. Mathematical Background (2.5 Weeks)
a. Probability
b. Gaussian distribution
c. Fitting a Gaussian distribution to data
d. Probabilistic curve fitting (Maximum Likelihood Estimation)
e. Bayesian curve fitting (Maximum Posterior Estimation)
f. Model selection (Cross Validation)
g. Calculus of variations
h. Constrained optimisation via Lagrange multipliers
3. Decision Theory (1 Week)
a. Minimising number of misclassifications
b. Minimising expected loss
c. Benefits of knowing posterior distributions
d. Generative. vs Discriminative vs. Discriminant functions
e. Loss functions for regression problems
4. Information Theory (1 Week)
a. Information ∝ 1/Probability
b. Entropy = expected information (measure of uncertainty)
i. Maximum Entropy Discrete Distribution (Uniform)
FALL – 2015
CS-567 Machine Learning
5.
6.
7.
8.
9.
ii. Maximum Entropy Continuous Distribution (Gaussian)
c. Jensen's Inequality
d. Relative Entropy (KL divergence)
e. Mutual Information
Probability Distributions and Density Estimation (2 Weeks)
a. Density Estimation is fundamentally ill-posed
b. Probability Distributions
i. Bernoulli
ii. Binomial
iii. Beta
iv. Multinomial
v. Dirichlet
vi. Gaussian
c. Completing-the-square
d. Sequential Learning via Conjugate Priors
e. Density Estimation Methods
Linear Models for Regression (1.5 Weeks)
a. Least-squares estimation
b. Design matrix
c. Pseudoinverse
d. Regularized least-squares estimation
e. Linear regression for multivariate targets
Linear Models for Classification (1.5 Weeks)
a. Discriminant Functions
i. Least-squares
ii. Fisher's Linear Discriminant (FLD)
iii. Perceptron
b. Probabilistic Generative Models
c. Probabilistic Discriminative Models
Neural Networks (3 Weeks)
a. Back-propagation
b. Regularization Techniques
i. Early stopping
ii. Weight decay
iii. Training with transformed data
c. Convolutional Neural Networks
Latent Variable Models (1.5 Weeks)
a. K-means Clustering -- alternating optimization
b. Gaussian Mixture Models
FALL – 2015
CS-567 Machine Learning
FALL – 2015
c. Expectation Maximisation (EM) Algorithm
10. Dimensionality Reduction (0.5 Week)
a. Principle Component Analysis
Grading Scheme/Criteria:
Assignments
Quizzes
Mid
Final
20%
5%
35%
40%
Grading Policy:
1. Theoretical assignments have to be submitted before the lecture on the due date.
2. There will be no make-up for any missed quiz.
3. Make-up for a mid-term or final exam will be allowed only under exceptional
circumstances provided that the instructor has been notified beforehand.
4. Instructor reserves the right to deny requests for any make-up quiz or exam.
5. Worst score on quizzes will be dropped.
6. Worst score on assignments will be dropped.
Download