Lecture 2: History and Overview of Machine Learning CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ CSC 4510 - M.A. Papalaskari - Villanova University 1 “It won’t truly be an autonomous vehicle until you instruct it to drive to work and it heads to the beach instead.” -Brad Templeton, Software designer and a consultant for the Google project on Autonomous Vehicles -NYTimes 1/24/12 -http://www.nytimes.com/2012/01/24/technology/googles-autonomous-vehicles-draw-skepticism-at-legal-symposium.html?_r=2&nl=technology&emc=techupdateema22 CSC 4510 - M.A. Papalaskari - Villanova University 2 What are the goals of AI research? Artifacts that THINK like HUMANS Artifacts that ACT like HUMANS Artifacts that THINK RATIONALLY Artifacts that ACT RATIONALLY CSC 4510 - M.A. Papalaskari - Villanova University 3 A Bit of History • Arthur Samuel (1959) wrote a program that learnt to play checkers well enough to beat him. CSC 4510 - M.A. Papalaskari - Villanova University 4 1940s Advances in mathematical logic, information theory, concept of neural computation 1943: McCulloch & Pitts Neuron 1948: Shannon: Information Theory 1949: Hebbian Learning cells that fire together, wire together 1950s Early computers. Dartmouth conference coins the phrase “artificial intelligence” and Lisp is proposed as the AI programming language 1950: Turing Test 1956: Dartmouth Conference 1958: Friedberg: Learn Assembly Code 1959: Samuel: Learning Checkers CSC 4510 - M.A. Papalaskari - Villanova University 5 1960s A.I. funding increased (mainly military). Famous quote: “Within a generation ... the problem of creating 'artificial intelligence' will substantially be solved.” Early symbolic reasoning approaches. Logic Theorist, GPS, Perceptrons 1969: Minsky & Papert “Perceptrons” 1970s A.I. “winter” – Funding dries up as people realize this is a hard problem! Limited computing power and dead-end frameworks lead to failures. eg: Machine Translation Failure CSC 4510 - M.A. Papalaskari - Villanova University 6 1980s Rule based “expert systems” used in medical / legal professions. Bio-inspired algorithms (Neural networks, Genetic Algorithms). Again: A.I. promises the world – lots of commercial investment Expert Systems (Mycin, Dendral, EMYCIN Knowledge Representation and reasoning: Frames, Eurisko, Cyc, NMR, fuzzy logic Speech Recognition (HEARSAY, HARPY, HWIM) ML: 1982: Hopfield Nets, Decision Trees, GA & GP. 1986: Backpropagation, Explanation-Based Learning CSC 4510 - M.A. Papalaskari - Villanova University 7 1990s Some concrete successes begin to emerge. AI diverges into separate fields: Computer Vision, Automated Reasoning, Planning systems, Natural Language processing, Machine Learning… …Machine Learning begins to overlap with statistics / probability theory. 1992: Koza & Genetic Programming 1995: Vapnik: Support Vector Machines CSC 4510 - M.A. Papalaskari - Villanova University 8 2000s First commercial-strength applications: Google, Amazon, computer games, routefinding, credit card fraud detection, spam filters, etc… Tools adopted as standard by other fields e.g. biology CSC 4510 - M.A. Papalaskari - Villanova University 9 2010s…. ?????? CSC 4510 - M.A. Papalaskari - Villanova University 10 • Using machine learning to detect spam emails. To: you@gmail.com GET YOUR DIPLOMA TODAY! If you are looking for a fast and cheap way to get a diploma, this is the best way out for you. Choose the desired field and degree and call us right now: For US: 1.845.709.8044 Outside US: +1.845.709.8044 "Just leave your NAME & PHONE NO. (with CountryCode)" in the voicemail. Our staff will get back to you in next few days! ALGORITHM Naïve Bayes Rule mining CSC 4510 - M.A. Papalaskari - Villanova University 11 • Using machine learning to recommend books. ALGORITHMS Collaborative Filtering Nearest Neighbour Clustering CSC 4510 - M.A. Papalaskari - Villanova University 12 • Using machine learning to identify faces and expressions. ALGORITHMS Decision Trees Adaboost CSC 4510 - M.A. Papalaskari - Villanova University 13 • Using machine learning to identify vocal patterns ALGORITHMS Feature Extraction Probabilistic Classifiers Support Vector Machines + many more…. CSC 4510 - M.A. Papalaskari - Villanova University 14 • ML for working with social network data: detecting fraud, predicting click-thru patterns, targeted advertising, etc etc etc . ALGORITHMS Support Vector Machines Collaborative filtering Rule mining algorithms Many many more…. CSC 4510 - M.A. Papalaskari - Villanova University 15 Samuel’s definition of ML is still relevant • Arthur Samuel (1959). Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed. CSC 4510 - M.A. Papalaskari - Villanova University 16 Tom Mitchell (1998): Well-posed Learning Problem A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. CSC 4510 - M.A. Papalaskari - Villanova University 17 Defining the Learning Task Improve on task, T, with respect to performance metric, P, based on experience, E. T: Playing checkers P: Percentage of games won against an arbitrary opponent E: Playing practice games against itself T: Recognizing hand-written words P: Percentage of words correctly classified E: Database of human-labeled images of handwritten words T: Driving on four-lane highways using vision sensors P: Average distance traveled before a human-judged error E: A sequence of images and steering commands recorded while observing a human driver. T: Determine which students like oranges or apples P: Percentage of students’ preferences guessed correctly E: Student attribute data CSC 4510 - M.A. Papalaskari - Villanova University 18 Designing a Learning System • Choose the training experience • Choose exactly what is too be learned, i.e. the target function. • Choose a learning algorithm to infer the target function from the experience. • A learning algorithm will also determine a performance measure Learner Environment/ Experience Knowledge Performance Element CSC 4510 - M.A. Papalaskari - Villanova University 19 Quick check: Improve on task, T, with respect to performance metric, P, based on experience, E. Suppose your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam. What is the task T in this setting? • Watching you label emails as spam or not spam. •Classifying emails as spam or not spam • The number (or fraction) of emails correctly classified as spam/not spam. • None of the above—this is not a machine learning problem. CSC 4510 - M.A. Papalaskari - Villanova University 20 Machine learning • Supervised Learning – Classification – Regression • Unsupervised learning Others: Reinforcement learning, recommender systems. Also talk about: Practical advice for applying learning algorithms. CSC 4510 - M.A. Papalaskari - Villanova University 21 Machine learning • Supervised Learning – Classification – Regression • Unsupervised learning Others: Reinforcement learning, recommender systems. Also talk about: Practical advice for applying learning algorithms. CSC 4510 - M.A. Papalaskari - Villanova University 22 Classification • Example: Credit scoring • Differentiating between low-risk and high-risk customers from their income and savings Discriminant: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk CSC 4510 - M.A. Papalaskari - Villanova University 23 Classification • Example: Iris data • 4 attributes – – – – sepal length sepal width petal length petal width • Differentiating between 3 different types of iris CSC 4510 - M.A. Papalaskari - Villanova University 24 Iris Data more plots: CSC 4510 - M.A. Papalaskari - Villanova University 25 Classification Tree CSC 4510 - M.A. Papalaskari - Villanova University 26 Face Recognition Training examples of a person Test images ORL dataset, AT&T Laboratories, Cambridge UK CSC 4510 - M.A. Papalaskari - Villanova University 27 Housing price prediction. 400 300 Price ($) in 1000’s 200 100 0 0 500 1000 1500 2000 2500 Size in feet2 Supervised Learning “right answers” given Regression: Predict continuous valued output (price) CSC 4510 - M.A. Papalaskari - Villanova University 28 Machine learning • Supervised Learning – Classification – Regression • Unsupervised learning Others: Reinforcement learning, recommender systems. Also talk about: Practical advice for applying learning algorithms. CSC 4510 - M.A. Papalaskari - Villanova University 29 Regression • Example: Price of a used car • x : car attributes y : price y = g (x | q ) g ( ) model, q parameters y = wx+w0 30 CSC 4510 - M.A. Papalaskari - Villanova University Regression Applications • Navigating a car: Angle of the steering • Kinematics of a robot arm CSC 4510 - M.A. Papalaskari - Villanova University 31 Supervised Learning: Uses • Prediction of future cases: Use the rule to predict the output for future inputs • Knowledge extraction: The rule is easy to understand • Compression: The rule is simpler than the data it explains • Outlier detection: Exceptions that are not covered by the rule, e.g., fraud CSC 4510 - M.A. Papalaskari - Villanova University 32 Quick check: You’re running a company, and you want to develop learning algorithms to address each of two problems. Problem 1: You have a large inventory of identical items. You want to predict how many of these items will sell over the next 3 months. Problem 2: You’d like software to examine individual customer accounts, and for each account decide if it has been hacked/compromised. Should you treat these as classification or as regression problems? •Treat both as classification problems. •Treat problem 1 as a classification problem, problem 2 as a regression problem. •Treat problem 1 as a regression problem, problem 2 as a classification problem. •Treat both as regression problems. Machine learning • Supervised Learning – Classification – Regression • Unsupervised learning Others: Reinforcement learning, recommender systems. Also talk about: Practical advice for applying learning algorithms. CSC 4510 - M.A. Papalaskari - Villanova University 34 Supervised Learning x 2 x CSC 4510 - M.A. Papalaskari - Villanova University 1 35 Unsupervised Learning x 2 x CSC 4510 - M.A. Papalaskari - Villanova University 1 36 Unsupervised Learning • • • • Learning “what normally happens” No output Clustering: Grouping similar instances Example applications – Customer segmentation – Image compression: Color quantization – Bioinformatics: Learning motifs CSC 4510 - M.A. Papalaskari - Villanova University 37 CSC 4510 - M.A. Papalaskari - Villanova University 38 CSC 4510 - M.A. Papalaskari - Villanova University 39 CSC 4510 - M.A. Papalaskari - Villanova University 40 CSC 4510 - M.A. Papalaskari - Villanova University 41 Genes Individuals CSC 4510 - M.A. Papalaskari - Villanova University [Source: Su-In Lee, Dana Pe’er, Aimee Dudley, George Church, Daphne Koller] 42 Organize computing clusters Social network analysis Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison) Market segmentation Astronomical data analysis CSC 4510 - M.A. Papalaskari - Villanova University 43 Quick check: Of the following examples, which would you address using an unsupervised learning algorithm? (Check all that apply.) •Given email labeled as spam/not spam, learn a spam filter. Given a database of customer data, automatically discover market segments and group customers into different market segments. •Given a set of web pages found on the web, automatically detect the ones that are syllabi for AI or software engineering courses •Given a dataset of patients diagnosed as either having diabetes or not, learn to classify new patients as having diabetes or not. •Given a database of nutrition data, automatically discover categories of food items. Machine learning • Supervised Learning – Classification – Regression • Unsupervised learning Others: Reinforcement learning, recommender systems. Also talk about: Practical advice for applying learning algorithms. CSC 4510 - M.A. Papalaskari - Villanova University 45 Reinforcement Learning • • • • • • Learning a policy: A sequence of outputs No supervised output but delayed reward Credit assignment problem Game playing Robot in a maze Multiple agents, partial observability, ... CSC 4510 - M.A. Papalaskari - Villanova University 46 Machine learning • Supervised Learning – Classification – Regression • Unsupervised learning Others: Reinforcement learning, recommender systems. Also talk about: Practical advice for applying learning algorithms. CSC 4510 - M.A. Papalaskari - Villanova University 47 Supervised or Unsupervised learning? Iris Data Summary • ML grew out of work in AI Optimize a performance criterion using example data or past experience. • Types of learning – Supervised – Unsupervised • Role of Statistics: Inference from a sample • Role of Computer science: – Data representation and modeling – Efficient algorithms to solve optimization problems – Representing and evaluating the model for inference CSC 4510 - M.A. Papalaskari - Villanova University 49 Resources: Datasets • UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html • UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.html • Statlib: http://lib.stat.cmu.edu/ • Delve: http://www.cs.utoronto.ca/~delve/ CSC 4510 - M.A. Papalaskari - Villanova University 50 Resources: Journals • • • • • • Journal of Machine Learning Research www.jmlr.org Machine Learning Neural Computation Neural Networks IEEE Transactions on Neural Networks IEEE Transactions on Pattern Analysis and Machine Intelligence • Annals of Statistics • Journal of the American Statistical Association • ... CSC 4510 - M.A. Papalaskari - Villanova University 51 Resources: Conferences • • • • • • International Conference on Machine Learning (ICML) European Conference on Machine Learning (ECML) Neural Information Processing Systems (NIPS) Uncertainty in Artificial Intelligence (UAI) Computational Learning Theory (COLT) International Conference on Artificial Neural Networks (ICANN) • International Conference on AI & Statistics (AISTATS) • International Conference on Pattern Recognition (ICPR) • ... Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/ CSC 4510 - M.A. Papalaskari - Villanova University 52