2023/07/26 COMP721 – Machine Learning INTRODUCTION Prof Serestina Viriri Email: viriris@ukzn.ac.za 1 1 Machine Learning: Overview 2 1 2023/07/26 A Few Quotes • “A breakthrough in machine learning would be worth ten Microsofts” (Bill Gates, Chairman, Microsoft) • “Machine learning is the next Internet” (Tony Tether, Director, DARPA) • Machine learning is the hot new thing” (John Hennessy, President, Stanford) • “Web rankings today are mostly a matter of machine learning” (Prabhakar Raghavan, Dir. Research, Yahoo) • “Machine learning is going to result in a real revolution” (Greg Papadopoulos, CTO, Sun) • “Machine learning is today’s discontinuity” (Jerry Yang, CEO, Yahoo) 3 Course Coordinator Serestina Viriri (Prof.) – Email: viriris@ukzn.ac.za – Website: • https://learn2022.ukzn.ac.za/course/view.php?id=610 (WVL) • https://learn2022.ukzn.ac.za/course/view.php?id=609 (PMB) 4 4 2 2023/07/26 Timetable Activity Time Venue Lectures (Tutorial) Thur 10:00 – 12:30 Online Consultation Time By appointment Online 5 5 Evaluation • Test 1: 08 September 2022 • Test 2: 20 October 2022 • Assignments: (Refer to assignment outline) • FINAL MARK = Test1 (30%) + Test2 (30%) + Assignments (20%) + Project (20%) 6 6 3 2023/07/26 Course Objectives • To provide with an in-depth introduction to two main areas of Machine Learning: supervised and unsupervised learning. • It covers some of the main models and algorithms for regression, classification, clustering and decision processes. 7 7 Background Requirements • Mathematical Tools – Linear algebra, Set theory, Vectors – Statistics, Probability – Optimization • Algorithms and Computer Programming – High level programming language (Python (Scikitlearn), (Weka, R) – (LaTex – scientific word processor) 8 8 4 2023/07/26 Topics Covered • • • • • • • • • • Introduction to Machine Learning Inductive Learning Decision Trees Instance-based Learning, MLE and EM Algorithms Bayesian Learning Neural Networks Model Ensembles Learning Theory (Deep learning) Support Vector Machines and Kernel Methods Clustering and Dimensionality Reduction 9 9 Textbooks and Notes • T. Mitchell, Machine Learning, McGraw-Hill, 1997. • R. Duda, P. Hart and D. Stork, Pattern Classification, Wiley, 2001, 2nd Edition. • K.P. Murphy, Machine Learning A Probabilistic Perspective, MIT Press, 2012. • C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006. • Course Website: https://learn2022.ukzn.ac.za/course/view.php?id=610 (WVL) • https://learn2022.ukzn.ac.za/course/view.php?id=609 (PMB) 10 10 5 2023/07/26 Impact of Machine Learning • Core of ML: Making predictions or decisions from Data. • Machine Learning is arguably the greatest export from Computer Science to other scientific fields. 11 So What Is Machine Learning? • • • • Automating automation Getting computers to program themselves Writing software is the bottleneck Let the data do the work instead! 12 6 2023/07/26 Traditional Programming Data Program Computer Output Computer Program Machine Learning Data Output 13 ML in a Nutshell • Tens of thousands of machine learning algorithms • Hundreds new every year • Every machine learning algorithm has three components: – Representation – Evaluation – Optimization 14 7 2023/07/26 What is Machine Learning? • Adapt to / learn from data – To optimize a performance function Can be used to: – Extract knowledge from data – Learn tasks that are difficult to formalise – Create software that improves over time 15 Generic methods • Learning from labelled data (supervised learning) E.g. Classification, regression, prediction, function approx. • Learning from unlabelled data (unsupervised learning) E.g. Clustering, visualisation, dimensionality reduction • Learning from sequential data E.g. Speech recognition, DNA data analysis • Associations • Reinforcement Learning 16 8 2023/07/26 Statistical Learning Machine learning methods can be unified within the framework of statistical learning: – Data is considered to be a sample from a probability distribution. – Typically, we don’t expect perfect learning but only “probably correct” learning. – Statistical concepts are the key to measuring our expected performance on novel problem instances. 17 Induction and Inference • Induction: Generalizing from specific examples. • Inference: Drawing conclusions from possibly incomplete knowledge. Learning machines need to do both. 18 9 2023/07/26 19 Machine Learning Applications 20 10 2023/07/26 Claim: The decision to use machine learning is more important than the choice of a particular learning method. 21 The machine learning framework • Apply a prediction function to a feature representation of the image to get the desired output: f( ) = “apple” f( ) = “tomato” f( ) = “cow” 22 11 2023/07/26 The machine learning framework y = f(x) output prediction function Image feature • Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the prediction function f by minimizing the prediction error on the training set • Testing: apply f to a never before seen test example x and output the predicted value y = f(x) 23 Steps Training Training Labels Training Images Image Features Training Learned model Learned model Prediction Testing Image Features Test Image 24 12 2023/07/26 Classifiers: Nearest neighbor Training examples from class 1 Test example Training examples from class 2 f(x) = label of the training example nearest to x • • All we need is a distance function for our inputs No training required! 25 Classifiers: Linear • Find a linear function to separate the classes: f(x) = sgn(w x + b) 26 13 2023/07/26 Many classifiers to choose from • • • • • • • • • • SVM Neural networks Naïve Bayes Bayesian network Logistic regression Randomized Forests Boosted Decision Trees K-nearest neighbor RBMs Etc. Which is the best one? 27 Generalization Training set (labels known) Test set (labels unknown) • How well does a learned model generalize from the data it was trained on to a new test set? 28 14 2023/07/26 Generalization • Components of generalization error – Bias: how much the average model over all training sets differ from the true model? • Error due to inaccurate assumptions/simplifications made by the model – Variance: how much models estimated from different training sets differ from each other? • Underfitting: model is too “simple” to represent all the relevant class characteristics – High bias and low variance – High training error and high test error • Overfitting: model is too “complex” and fits irrelevant characteristics (noise) in the data – Low bias and high variance – Low training error and high test error 29 No Free Lunch Theorem 30 15 2023/07/26 Bias-Variance Trade-off • Models with too few parameters are inaccurate because of a large bias (not enough flexibility). • Models with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample). 31 Bias-Variance Trade-off E(MSE) = noise2 + bias2 + variance Unavoidable error Error due to incorrect assumptions Error due to variance of training samples 32 16 2023/07/26 Bias-variance tradeoff Overfitting Error Underfitting Test error Training error High Bias Low Variance Complexity Low Bias High Variance 33 Bias-variance tradeoff Test Error Few training examples High Bias Low Variance Many training examples Complexity Low Bias High Variance 34 17 2023/07/26 Effect of Training Size Error Fixed prediction model Testing Generalization Error Training Number of Training Examples Slide credit: D. Hoiem 35 The perfect classification algorithm • Objective function: encodes the right loss for the problem • Parameterization: makes assumptions that fit the problem • Regularization: right level of regularization for amount of training data • Training algorithm: can find parameters that maximize objective on training set • Inference algorithm: can solve for objective function in evaluation 36 18 2023/07/26 Remember… • No classifier is inherently better than any other: you need to make assumptions to generalize • Three kinds of error – Inherent: unavoidable – Bias: due to over-simplifications – Variance: due to inability to perfectly estimate parameters from limited data 37 How to reduce variance? • Choose a simpler classifier • Regularize the parameters • Get more training data Slide credit: D. Hoiem 38 19 2023/07/26 What to remember about classifiers • No free lunch: machine learning algorithms are tools, not dogmas • Try simple classifiers first • Better to have smart features and simple classifiers than simple features and smart classifiers • Use increasingly powerful classifiers with more training data (bias-variance tradeoff) 39 Representation • • • • • • • • Decision trees Sets of rules / Logic programs Instances Graphical models (Bayes/Markov nets) Neural networks Support vector machines Model ensembles Etc. 40 20 2023/07/26 Evaluation • • • • • • • • • • Accuracy Precision and recall Squared error Likelihood Posterior probability Cost / Utility Margin Entropy K-L divergence Etc. 41 Optimization • Combinatorial optimization – E.g.: Greedy search • Convex optimization – E.g.: Gradient descent • Constrained optimization – E.g.: Linear programming 42 21 2023/07/26 Types of Learning • Supervised (inductive) learning – Training data includes desired outputs • Unsupervised learning – Training data does not include desired outputs • Semi-supervised learning – Training data includes a few desired outputs • Reinforcement learning – Rewards from sequence of actions 43 Homework • Install the following: – Python (Scikit-learn) – Weka • Familiarize yourself with the Python IDE 44 44 22