Machine Learning Introduction

2023/07/26 COMP721 – Machine Learning INTRODUCTION Prof Serestina Viriri Email: viriris@ukzn.ac.za 1 1 Machine Learning: Overview 2 1 2023/07/26 A Few Quotes • “A breakthrough in machine learning would be worth ten Microsofts” (Bill Gates, Chairman, Microsoft) • “Machine learning is the next Internet” (Tony Tether, Director, DARPA) • Machine learning is the hot new thing” (John Hennessy, President, Stanford) • “Web rankings today are mostly a matter of machine learning” (Prabhakar Raghavan, Dir. Research, Yahoo) • “Machine learning is going to result in a real revolution” (Greg Papadopoulos, CTO, Sun) • “Machine learning is today’s discontinuity” (Jerry Yang, CEO, Yahoo) 3 Course Coordinator Serestina Viriri (Prof.) – Email: viriris@ukzn.ac.za – Website: • https://learn2022.ukzn.ac.za/course/view.php?id=610 (WVL) • https://learn2022.ukzn.ac.za/course/view.php?id=609 (PMB) 4 4 2 2023/07/26 Timetable Activity Time Venue Lectures (Tutorial) Thur 10:00 – 12:30 Online Consultation Time By appointment Online 5 5 Evaluation • Test 1: 08 September 2022 • Test 2: 20 October 2022 • Assignments: (Refer to assignment outline) • FINAL MARK = Test1 (30%) + Test2 (30%) + Assignments (20%) + Project (20%) 6 6 3 2023/07/26 Course Objectives • To provide with an in-depth introduction to two main areas of Machine Learning: supervised and unsupervised learning. • It covers some of the main models and algorithms for regression, classification, clustering and decision processes. 7 7 Background Requirements • Mathematical Tools – Linear algebra, Set theory, Vectors – Statistics, Probability – Optimization • Algorithms and Computer Programming – High level programming language (Python (Scikitlearn), (Weka, R) – (LaTex – scientific word processor) 8 8 4 2023/07/26 Topics Covered • • • • • • • • • • Introduction to Machine Learning Inductive Learning Decision Trees Instance-based Learning, MLE and EM Algorithms Bayesian Learning Neural Networks Model Ensembles Learning Theory (Deep learning) Support Vector Machines and Kernel Methods Clustering and Dimensionality Reduction 9 9 Textbooks and Notes • T. Mitchell, Machine Learning, McGraw-Hill, 1997. • R. Duda, P. Hart and D. Stork, Pattern Classification, Wiley, 2001, 2nd Edition. • K.P. Murphy, Machine Learning A Probabilistic Perspective, MIT Press, 2012. • C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006. • Course Website: https://learn2022.ukzn.ac.za/course/view.php?id=610 (WVL) • https://learn2022.ukzn.ac.za/course/view.php?id=609 (PMB) 10 10 5 2023/07/26 Impact of Machine Learning • Core of ML: Making predictions or decisions from Data. • Machine Learning is arguably the greatest export from Computer Science to other scientific fields. 11 So What Is Machine Learning? • • • • Automating automation Getting computers to program themselves Writing software is the bottleneck Let the data do the work instead! 12 6 2023/07/26 Traditional Programming Data Program Computer Output Computer Program Machine Learning Data Output 13 ML in a Nutshell • Tens of thousands of machine learning algorithms • Hundreds new every year • Every machine learning algorithm has three components: – Representation – Evaluation – Optimization 14 7 2023/07/26 What is Machine Learning? • Adapt to / learn from data – To optimize a performance function Can be used to: – Extract knowledge from data – Learn tasks that are difficult to formalise – Create software that improves over time 15 Generic methods • Learning from labelled data (supervised learning) E.g. Classification, regression, prediction, function approx. • Learning from unlabelled data (unsupervised learning) E.g. Clustering, visualisation, dimensionality reduction • Learning from sequential data E.g. Speech recognition, DNA data analysis • Associations • Reinforcement Learning 16 8 2023/07/26 Statistical Learning Machine learning methods can be unified within the framework of statistical learning: – Data is considered to be a sample from a probability distribution. – Typically, we don’t expect perfect learning but only “probably correct” learning. – Statistical concepts are the key to measuring our expected performance on novel problem instances. 17 Induction and Inference • Induction: Generalizing from specific examples. • Inference: Drawing conclusions from possibly incomplete knowledge. Learning machines need to do both. 18 9 2023/07/26 19 Machine Learning Applications 20 10 2023/07/26 Claim: The decision to use machine learning is more important than the choice of a particular learning method. 21 The machine learning framework • Apply a prediction function to a feature representation of the image to get the desired output: f( ) = “apple” f( ) = “tomato” f( ) = “cow” 22 11 2023/07/26 The machine learning framework y = f(x) output prediction function Image feature • Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the prediction function f by minimizing the prediction error on the training set • Testing: apply f to a never before seen test example x and output the predicted value y = f(x) 23 Steps Training Training Labels Training Images Image Features Training Learned model Learned model Prediction Testing Image Features Test Image 24 12 2023/07/26 Classifiers: Nearest neighbor Training examples from class 1 Test example Training examples from class 2 f(x) = label of the training example nearest to x • • All we need is a distance function for our inputs No training required! 25 Classifiers: Linear • Find a linear function to separate the classes: f(x) = sgn(w  x + b) 26 13 2023/07/26 Many classifiers to choose from • • • • • • • • • • SVM Neural networks Naïve Bayes Bayesian network Logistic regression Randomized Forests Boosted Decision Trees K-nearest neighbor RBMs Etc. Which is the best one? 27 Generalization Training set (labels known) Test set (labels unknown) • How well does a learned model generalize from the data it was trained on to a new test set? 28 14 2023/07/26 Generalization • Components of generalization error – Bias: how much the average model over all training sets differ from the true model? • Error due to inaccurate assumptions/simplifications made by the model – Variance: how much models estimated from different training sets differ from each other? • Underfitting: model is too “simple” to represent all the relevant class characteristics – High bias and low variance – High training error and high test error • Overfitting: model is too “complex” and fits irrelevant characteristics (noise) in the data – Low bias and high variance – Low training error and high test error 29 No Free Lunch Theorem 30 15 2023/07/26 Bias-Variance Trade-off • Models with too few parameters are inaccurate because of a large bias (not enough flexibility). • Models with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample). 31 Bias-Variance Trade-off E(MSE) = noise2 + bias2 + variance Unavoidable error Error due to incorrect assumptions Error due to variance of training samples 32 16 2023/07/26 Bias-variance tradeoff Overfitting Error Underfitting Test error Training error High Bias Low Variance Complexity Low Bias High Variance 33 Bias-variance tradeoff Test Error Few training examples High Bias Low Variance Many training examples Complexity Low Bias High Variance 34 17 2023/07/26 Effect of Training Size Error Fixed prediction model Testing Generalization Error Training Number of Training Examples Slide credit: D. Hoiem 35 The perfect classification algorithm • Objective function: encodes the right loss for the problem • Parameterization: makes assumptions that fit the problem • Regularization: right level of regularization for amount of training data • Training algorithm: can find parameters that maximize objective on training set • Inference algorithm: can solve for objective function in evaluation 36 18 2023/07/26 Remember… • No classifier is inherently better than any other: you need to make assumptions to generalize • Three kinds of error – Inherent: unavoidable – Bias: due to over-simplifications – Variance: due to inability to perfectly estimate parameters from limited data 37 How to reduce variance? • Choose a simpler classifier • Regularize the parameters • Get more training data Slide credit: D. Hoiem 38 19 2023/07/26 What to remember about classifiers • No free lunch: machine learning algorithms are tools, not dogmas • Try simple classifiers first • Better to have smart features and simple classifiers than simple features and smart classifiers • Use increasingly powerful classifiers with more training data (bias-variance tradeoff) 39 Representation • • • • • • • • Decision trees Sets of rules / Logic programs Instances Graphical models (Bayes/Markov nets) Neural networks Support vector machines Model ensembles Etc. 40 20 2023/07/26 Evaluation • • • • • • • • • • Accuracy Precision and recall Squared error Likelihood Posterior probability Cost / Utility Margin Entropy K-L divergence Etc. 41 Optimization • Combinatorial optimization – E.g.: Greedy search • Convex optimization – E.g.: Gradient descent • Constrained optimization – E.g.: Linear programming 42 21 2023/07/26 Types of Learning • Supervised (inductive) learning – Training data includes desired outputs • Unsupervised learning – Training data does not include desired outputs • Semi-supervised learning – Training data includes a few desired outputs • Reinforcement learning – Rewards from sequence of actions 43 Homework • Install the following: – Python (Scikit-learn) – Weka • Familiarize yourself with the Python IDE 44 44 22

Machine Learning Introduction

Related documents

Products

Support

Machine Learning Introduction

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib