Uploaded by yads24.i

COMP721-01

advertisement
2023/07/26
COMP721 – Machine Learning
INTRODUCTION
Prof Serestina Viriri
Email: viriris@ukzn.ac.za
1
1
Machine Learning: Overview
2
1
2023/07/26
A Few Quotes
• “A breakthrough in machine learning would be worth ten
Microsofts” (Bill Gates, Chairman, Microsoft)
• “Machine learning is the next Internet” (Tony Tether, Director, DARPA)
• Machine learning is the hot new thing” (John Hennessy, President,
Stanford)
• “Web rankings today are mostly a matter of machine learning”
(Prabhakar Raghavan, Dir. Research, Yahoo)
• “Machine learning is going to result in a real revolution” (Greg
Papadopoulos, CTO, Sun)
• “Machine learning is today’s discontinuity” (Jerry Yang, CEO, Yahoo)
3
Course Coordinator
Serestina Viriri (Prof.)
– Email: viriris@ukzn.ac.za
– Website:
• https://learn2022.ukzn.ac.za/course/view.php?id=610 (WVL)
• https://learn2022.ukzn.ac.za/course/view.php?id=609 (PMB)
4
4
2
2023/07/26
Timetable
Activity
Time
Venue
Lectures (Tutorial)
Thur 10:00 – 12:30 Online
Consultation Time
By appointment
Online
5
5
Evaluation
• Test 1: 08 September 2022
• Test 2: 20 October 2022
• Assignments: (Refer to assignment outline)
• FINAL MARK = Test1 (30%) + Test2 (30%) +
Assignments (20%) + Project (20%)
6
6
3
2023/07/26
Course Objectives
• To provide with an in-depth introduction to two
main areas of Machine Learning: supervised and
unsupervised learning.
• It covers some of the main models and
algorithms for regression, classification,
clustering and decision processes.
7
7
Background Requirements
• Mathematical Tools
– Linear algebra, Set theory, Vectors
– Statistics, Probability
– Optimization
• Algorithms and Computer Programming
– High level programming language (Python (Scikitlearn), (Weka, R)
– (LaTex – scientific word processor)
8
8
4
2023/07/26
Topics Covered
•
•
•
•
•
•
•
•
•
•
Introduction to Machine Learning
Inductive Learning
Decision Trees
Instance-based Learning, MLE and EM Algorithms
Bayesian Learning
Neural Networks
Model Ensembles
Learning Theory (Deep learning)
Support Vector Machines and Kernel Methods
Clustering and Dimensionality Reduction
9
9
Textbooks and Notes
• T. Mitchell, Machine Learning, McGraw-Hill, 1997.
• R. Duda, P. Hart and D. Stork, Pattern Classification, Wiley, 2001,
2nd Edition.
• K.P. Murphy, Machine Learning A Probabilistic Perspective, MIT
Press, 2012.
• C. Bishop, Pattern Recognition and Machine Learning, Springer,
2006.
• Course Website:
https://learn2022.ukzn.ac.za/course/view.php?id=610 (WVL)
• https://learn2022.ukzn.ac.za/course/view.php?id=609 (PMB)
10
10
5
2023/07/26
Impact of Machine Learning
• Core of ML: Making predictions or decisions
from Data.
• Machine Learning is arguably the greatest
export from Computer Science to other scientific
fields.
11
So What Is Machine Learning?
•
•
•
•
Automating automation
Getting computers to program themselves
Writing software is the bottleneck
Let the data do the work instead!
12
6
2023/07/26
Traditional Programming
Data
Program
Computer
Output
Computer
Program
Machine Learning
Data
Output
13
ML in a Nutshell
• Tens of thousands of machine learning
algorithms
• Hundreds new every year
• Every machine learning algorithm has three
components:
– Representation
– Evaluation
– Optimization
14
7
2023/07/26
What is Machine Learning?
• Adapt to / learn from data
– To optimize a performance function
Can be used to:
– Extract knowledge from data
– Learn tasks that are difficult to formalise
– Create software that improves over time
15
Generic methods
• Learning from labelled data (supervised learning)
E.g. Classification, regression, prediction, function
approx.
• Learning from unlabelled data (unsupervised learning)
E.g. Clustering, visualisation, dimensionality reduction
• Learning from sequential data
E.g. Speech recognition, DNA data analysis
• Associations
• Reinforcement Learning
16
8
2023/07/26
Statistical Learning
Machine learning methods can be unified within
the framework of statistical learning:
– Data is considered to be a sample from a
probability distribution.
– Typically, we don’t expect perfect learning but
only “probably correct” learning.
– Statistical concepts are the key to measuring our
expected performance on novel problem
instances.
17
Induction and Inference
• Induction: Generalizing from specific examples.
• Inference: Drawing conclusions from possibly
incomplete knowledge.
Learning machines need to do both.
18
9
2023/07/26
19
Machine Learning Applications
20
10
2023/07/26
Claim:
The decision to use machine learning
is more important than the choice of
a particular learning method.
21
The machine learning framework
• Apply a prediction function to a feature representation of the
image to get the desired output:
f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
22
11
2023/07/26
The machine learning framework
y = f(x)
output
prediction
function
Image
feature
• Training: given a training set of labeled examples {(x1,y1), …,
(xN,yN)}, estimate the prediction function f by minimizing the
prediction error on the training set
• Testing: apply f to a never before seen test example x and output
the predicted value y = f(x)
23
Steps
Training
Training
Labels
Training
Images
Image
Features
Training
Learned
model
Learned
model
Prediction
Testing
Image
Features
Test Image
24
12
2023/07/26
Classifiers: Nearest neighbor
Training
examples
from class 1
Test
example
Training
examples
from class 2
f(x) = label of the training example nearest to x
•
•
All we need is a distance function for our inputs
No training required!
25
Classifiers: Linear
• Find a linear function to separate the classes:
f(x) = sgn(w  x + b)
26
13
2023/07/26
Many classifiers to choose from
•
•
•
•
•
•
•
•
•
•
SVM
Neural networks
Naïve Bayes
Bayesian network
Logistic regression
Randomized Forests
Boosted Decision Trees
K-nearest neighbor
RBMs
Etc.
Which is the best one?
27
Generalization
Training set (labels known)
Test set (labels
unknown)
• How well does a learned model generalize from
the data it was trained on to a new test set?
28
14
2023/07/26
Generalization
• Components of generalization error
– Bias: how much the average model over all training sets differ from the true
model?
• Error due to inaccurate assumptions/simplifications made by the model
– Variance: how much models estimated from different training sets differ
from each other?
• Underfitting: model is too “simple” to represent all the relevant
class characteristics
– High bias and low variance
– High training error and high test error
• Overfitting: model is too “complex” and fits irrelevant
characteristics (noise) in the data
– Low bias and high variance
– Low training error and high test error
29
No Free Lunch Theorem
30
15
2023/07/26
Bias-Variance Trade-off
• Models with too few
parameters are
inaccurate because of a
large bias (not enough
flexibility).
• Models with too many
parameters are
inaccurate because of a
large variance (too much
sensitivity to the sample).
31
Bias-Variance Trade-off
E(MSE) = noise2 + bias2 + variance
Unavoidable
error
Error due to
incorrect
assumptions
Error due to
variance of training
samples
32
16
2023/07/26
Bias-variance tradeoff
Overfitting
Error
Underfitting
Test error
Training error
High Bias
Low Variance
Complexity
Low Bias
High Variance
33
Bias-variance tradeoff
Test Error
Few training examples
High Bias
Low Variance
Many training examples
Complexity
Low Bias
High Variance
34
17
2023/07/26
Effect of Training Size
Error
Fixed prediction model
Testing
Generalization Error
Training
Number of Training Examples
Slide credit: D. Hoiem
35
The perfect classification algorithm
• Objective function: encodes the right loss for the problem
• Parameterization: makes assumptions that fit the problem
• Regularization: right level of regularization for amount of
training data
• Training algorithm: can find parameters that maximize
objective on training set
• Inference algorithm: can solve for objective function in
evaluation
36
18
2023/07/26
Remember…
• No classifier is inherently
better than any other: you
need to make assumptions to
generalize
• Three kinds of error
– Inherent: unavoidable
– Bias: due to over-simplifications
– Variance: due to inability to
perfectly estimate parameters
from limited data
37
How to reduce variance?
• Choose a simpler classifier
• Regularize the parameters
• Get more training data
Slide credit: D. Hoiem
38
19
2023/07/26
What to remember about classifiers
• No free lunch: machine learning algorithms are tools, not
dogmas
• Try simple classifiers first
• Better to have smart features and simple classifiers than
simple features and smart classifiers
• Use increasingly powerful classifiers with more training
data (bias-variance tradeoff)
39
Representation
•
•
•
•
•
•
•
•
Decision trees
Sets of rules / Logic programs
Instances
Graphical models (Bayes/Markov nets)
Neural networks
Support vector machines
Model ensembles
Etc.
40
20
2023/07/26
Evaluation
•
•
•
•
•
•
•
•
•
•
Accuracy
Precision and recall
Squared error
Likelihood
Posterior probability
Cost / Utility
Margin
Entropy
K-L divergence
Etc.
41
Optimization
• Combinatorial optimization
– E.g.: Greedy search
• Convex optimization
– E.g.: Gradient descent
• Constrained optimization
– E.g.: Linear programming
42
21
2023/07/26
Types of Learning
• Supervised (inductive) learning
– Training data includes desired outputs
• Unsupervised learning
– Training data does not include desired outputs
• Semi-supervised learning
– Training data includes a few desired outputs
• Reinforcement learning
– Rewards from sequence of actions
43
Homework
• Install the following:
– Python (Scikit-learn)
– Weka
• Familiarize yourself with the Python IDE
44
44
22
Download