MACHINE LEARNING 1. Introduction What is Machine Learning? 2 Need an algorithm to solve a problem on a computer An Algorithm is a sequence of instructions to transform input from output Example: Sort list of numbers Input: set of numbers Output: ordered list of numbers Many algorithms for the same task May be interested in finding the most efficient Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) What is Machine Learning? 3 Don’t have an algorithm for some tasks Example: Tell the spam e-mail for legitimate e-mails Know the input (an email) and output (yes/no) Don’t know how to transform input to output Definition of spam may change over the time and from individual to individual We don’t have a knowledge, replace it with data Can easily produce large amount of examples Want a computer to extract an algorithm from the examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) What is Machine Learning? 4 Believe that there is a process explaining the data We don’t know details about the process we know it’s not random Example: Consumer Behavior Frequently buy beer with chips Buy more ice-cream ins summer There is certain patterns in data Rarely can’t indentify patterns completely Can construct good and useful approximation Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Approximations to Patterns 5 May not explain everything Still detect some patterns and regularities Use patterns Understand the process Make a prediction Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Data Mining: Application of ML to large databases 6 Retail: Market basket analysis, Customer relationship management (CRM) Finance: Credit scoring, fraud detection Manufacturing: Optimization, troubleshooting Medicine: Medical diagnosis Telecommunications: Quality of service optimization Bioinformatics: Motifs, alignment Web mining: Search engines ... Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Examples of Ml Applications 7 Learning association: Basket Analysis If people buy X they typically buy Y There is a customer who buys X and don’t buy Y He/She is a potential Y customer Find such customers and target them for cross-selling Find an association rule: P(Y|X) D customer attributes (e. g.age, gender) P(Y|X,D) Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Association Rules examples 8 Bookseller instead of supermarket Products are books or authors Web portal Links the user is likely to click Pre-download pages in advance Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Credit Scoring 9 Bank want to predict a risk associated with loan Probability that customer will default given the information about the customer Income, Savings, profession Association between customer attributes and his risk Fits a model to the past data to be able to calculate a risk for a new application Accept/Refuse application Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Classification Problem 10 Two classes of customers: low-risk and high-risk Input: information about a customer Output: assignment to one of two classes Example of classification rule IF income> θ1 AND savings> θ2 THEN low-risk ELSE high-risk Discriminant: function that separates the examples of different classes Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Discriminant Rule 11 Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Discriminant Rule 12 Prediction: User rule for novel instances In some instances may want to calculate probabilities instead of 0/1 P(Y|X), P(Y=1|X=x) =0.8 , customer has an 80% probability of being high-risk Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Pattern Recognition 13 Optical character recognition (OCR) , recognizing character codes from their images Number of classes as many as number of images we would like to recognize Handwritten characters (zip code on envelopes or amounts on checks) Different handwriting styles, character sizes, pen or pencil We don’t have a formal description that covers all A’s characters and none of non-A’s Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) OCR 14 All A have something in common Extract pattern from examples Use redundancy in human languages Word is a sequence of characters Not all sequences are equally likely Can still r?ad some w?rds ML algorithms should model dependencies among characters in a word and word in the sequence Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Face recognition 15 Input : image Classes: people to be recognized Learning program: learn to associate faces to identities More difficult then OCR More classes Images are larger Differences in pose and lightening cause significant changes in image Occlusions: Glasses, beard Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Face Recognition 16 Training examples of a person Test images AT&T Laboratories, Cambridge UK http://www.uk.research.att.com/facedatabase.html Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Medical Diagnosis 17 Input: Information about the patient Age, past medical history, symptoms Output: illnesses Can apply some additional tests Costly and inconvenient Some information might be missing Can decide to apply test if believe valuable High price of wrong decision Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Speech Recognition 18 Input: acoustic signal Output: words Different accents and voices Can integrate language models Combine with lips movement Sensor fusion Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Knowledge Extraction 19 The rule is simpler than the data Example: Discriminant separating low-risk and high risk customer helps to define low risk customer Target low risk customer through advertising Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Compression 20 Explanation simpler than data Discard the data , keep the rule Less memory Example: Image compression Learn most common colors in image Represent slightly different but similar colors by single value Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Outlier detection 21 Find instances which do not obey rules Interesting not in a rule but an exception not covered by the rule Examples: Learn properties of standard credit card transactions Outlier is a suspected fraud Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Why “Learn” ? 22 Machine learning is programming computers to optimize a performance criterion using example data or past experience. There is no need to “learn” to calculate payroll Learning is used when: Human expertise does not exist (navigating on Mars), Humans are unable to explain their expertise (speech recognition) Solution changes in time (routing on a computer network) Solution needs to be adapted to particular cases (user biometrics) Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) What We Talk About When We Talk About“Learning” 23 Learning general models from a data of particular examples Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and scarce. Example in retail: Customer transactions to consumer behavior: People who bought “Da Vinci Code” also bought “The Five People You Meet in Heaven” (www.amazon.com) Build a model that is a good and useful approximation to the data. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) What is Machine Learning? 24 Optimize a performance criterion using example data or past experience. Role of Statistics: Inference from a sample Role of Computer science: Efficient algorithms to Solve the optimization problem Representing and evaluating the model for inference Based On E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Regression 25 Example: Price of a used car x : car attributes y : price y = g (x | θ ) g ( ) model, θ parameters y = wx+w0 Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Supervised Learning 26 Regression and classification are supervised learning problems There is an input and output Need to learn mapping from input to output ML approach: assume a model defined up to a set of parameters y = g(x|θ) Machine Learning Program optimize the parameters to minimize the error Linear model might be two restrictive (large error) Use more complex models y = w2x2 + w1x + w0 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Supervised Learning: Uses 27 Prediction of future cases: Use the rule to predict the output for future inputs Knowledge extraction: The rule is easy to understand Compression: The rule is simpler than the data it explains Outlier detection: Exceptions that are not covered by the rule, e.g., fraud Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Unsupervised Learning 28 Learning “what normally happens” No output, only input Statistics: Density estimation Clustering: Grouping similar instances Example applications Customer segmentation in CRM Image compression: Color quantization Document Clustering Based On E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Reinforcement Learning 29 Learning a policy: A sequence of outputs Single action is not important Action is good if its part of a good policy Learn from past good policies Delayed reward Example: Game playing Robot in a maze Reach the goal state from an initial state Based on Introduction to Machine Learning © The MIT Press (V1.1)