Machine Learning Spring 2013 Rong Jin 1 CSE847 Machine Learning Instructor: Rong Jin Office Hour: Textbook Tuesday 4:00pm-5:00pm TA, Qiaozi Gao, Thursday 4:00pm-5:00pm Machine Learning The Elements of Statistical Learning Pattern Recognition and Machine Learning Many subjects are from papers Web site: http://www.cse.msu.edu/~cse847 2 Requirements ~10 homework assignments Course project Topic: visual object recognition Data: over one million images with extracted visual features Objective: build a classifier that automatically identifies the class of objects in images Midterm exam & final exam 3 Goal Familiarize you with the state-of-art in Machine Learning Breadth: many different techniques Depth: Project Hands-on experience Develop the way of machine learning thinking Learn how to model real-world problems by machine learning techniques Learn how to deal with practical issues 4 Course Outline Theoretical Aspects Practical Aspects • Information Theory • Supervised Learning Algorithms • Optimization Theory • Unsupervised Learning Algorithms • Probability Theory • Important Practical Issues • Learning Theory • Applications 5 Today’s Topics Why is machine learning? Example: learning to play backgammon General issues in machine learning 6 Why Machine Learning? Past: most computer programs are mainly made by hand Future: Computers should be able to program themselves by the interaction with their environment 7 Recent Trends Recent progress in algorithm and theory Growing flood of online data Computational power is available Growing industry 8 Big Data Challenge • • 2.7 Zetabytes (1021) of data exists in the digital universe today. Huge amount of data generated on the Internet every minute • • • YouTube users upload 48 hours of video, Facebook users share 684,478 pieces of content, Instagram users share 3,600 new photos, http://www.visualnews.com/2012/06/19/how-much-data-created-every-minute/ Big Data Challenge High dimensional data appears in many applications of machine learning Fine grained visual classification [1] • 250,000 features Why Data Size Matters ? • Matrix completion Classification, clustering, recommender systems Why Data Size Matters ? • Matrix can be perfectly recovered provided the number of observed entries O(rnlog2(n)) Why Data Size Matters ? • The recovery error can be arbitrarily large if the number of observed entries < O(rnlog(n)) Why Data Size Matters ? error O(rnlog (n)) O(rnlog2(n)) Unknown # observed entries Alibaba Small and Micro Financial Services • Difficult to access finance for small & medium business • Minimum loan • Tedious loan approval procedure • Low approval rate • Long cycle • Completely big data driven • Leverage e-commerce data to financial services Shipping Insurance for Returned Products • • • Insurance contracts has year-on-year growth rate of 100%. Over 1 billion contracts in 2013 Over 100 million contracts one day on November 11, 2013 Overall rate of compensation 140.00% 120.00% 100.00% 80.00% 60.00% 40.00% Shipping Insurance for Returned Products Fixed rate Uniform 5% fixed rate Simple Dynamic pricing Millions of features, real time pricing Machine learned model Highly accurate Actuarial approach Solely based on historical data and demographics Easy to explain Data based pricing Pricing model based on a few couple parameters Relatively accurate Three Niches for Machine Learning Data mining: using historical data to improve decisions Software applications that are difficult to program by hand Medical records medical knowledge Autonomous driving Image Classification User modeling Automatic recommender systems 18 Typical Data Mining Task Given: • 9147 patient records, each describing pregnancy and birth • Each patient contains 215 features Task: • Classes of future patients at high risk for Emergency Cesarean Section 19 Data Mining Results One of 18 learned rules: If no previous vaginal delivery abnormal 2nd Trimester Ultrasound Malpresentation at admission Then probability of Emergency C-Section is 0.6 20 Credit Risk Analysis Learned Rules: If Then If Then Other-Delinquent-Account > 2 Number-Delinquent-Billing-Cycles > 1 Profitable-Costumer ? = no Other-Delinquent-Account = 0 (Income > $30K or Years-of-Credit > 3) Profitable-Costumer ? = yes 21 Programs too Difficult to Program By Hand ALVINN drives 70mph on highways 22 Programs too Difficult to Program By Hand ALVINN drives 70mph on highways 23 Programs too Difficult to Program By Hand Visual object recognition Classify Bird Images Positive Examples Train Negative Examples Statistical Model Test 24 Image Retrieval using Texts 25 Software that Models Users History What to Recommend? Description:A homicide detective and a Description: A high-school boy fire marshall must stop a pair of murderers who commit videotaped crimes to become media darlings is given the chance to write a story about an up-and-coming rock band as he accompanies it on their concert tour. Rating: Description: A biography of sports legend, Muhammad Ali, from his early days to his days in the ring Rating: Description: Benjamin Martin is drawn into the American revolutionary war against his will when a brutal British commander kills his son. Rating: Recommend: ?No Description: A young adventurer named Milo Thatch joins an intrepid group of explorers to find the mysterious lost continent of Atlantis. Recommend: ?Yes 26 Netflix Contest 27 Relevant Disciplines Artificial Intelligence Statistics (particularly Bayesian Stat.) Computational complexity theory Information theory Optimization theory Philosophy Psychology … 28 Today’s Topics Why is machine learning? Example: learning to play backgammon General issues in machine learning 29 What is the Learning Problem Learning = Improving with experience at some task Improve over task T With respect to performance measure P Based on experience E Example: Learning to Play Backgammon T: Play backgammon P: % of games won in world tournament E: opportunity to play against itself 30 Backgammon More than 1020 states (boards) Best human players see only small fraction of all board during lifetime Searching is hard because of dice (branching factor > 100) 31 TD-Gammon by Tesauro (1995) Trained by playing with itself Now approximately equal to the best human player 32 Learn to Play Chess Task T: Play chess Performance P: Percent of games won in the world tournament Experience E: What experience? How shall it be represented? What exactly should be learned? What specific algorithm to learn it? 33 Choose a Target Function Goal: Policy: : b m Choice of value function B = board = real values V: b, m 34 Choose a Target Function Goal: Policy: : b m Choice of value function B = board = real values V: b, m V: b 35 Value Function V(b): Example Definition If b final board that is won: If b final board that is lost: V(b) = 1 V(b) = -1 If b not final board V(b) = E[V(b*)] where b* is final board after playing optimally 36 Representation of Target Function V(b) Same value Lookup table for each board (one entry for each board) Summarize experience into • Polynomials • Neural Networks No Learning No Generalization 37 Example: Linear Feature Representation Features: Linear function: pb(b), pw(b) = number of black (white) pieces on board b ub(b), ub(b) = number of unprotected pieces tb(b), tb(b) = number of pieces threatened by opponent V(b) = w0pb(b)+ w1pw(b)+ w2ub(b)+ w3uw(b)+ w4tb(b)+ w5tw(b) Learning: Estimation of parameters w0, …, w5 38 Tuning Weights Given: board b Predicted value V(b) Desired value V*(b) Calculate error(b) = (V*(b) – V(b))2 For each board feature fi wi wi + cerror(b)fi Stochastically minimizes b (V*(b)-V(b))2 Gradient Descent Optimization 39 Obtain Boards Random boards Beginner plays Professionals plays 40 Obtain Target Values Person provides value V(b) Play until termination. If outcome is Win: V(b) 1 Loss: V(b) -1 Draw: V(b) 0 for all boards for all boards for all boards Play one move: b b’ V(b) V(b’) Play n moves: b b’… b(n) V(b) V(b(n)) 41 A General Framework MathematicalM odeling Statistics Finding Optimal Parameters + Optimization Machine Learning 42 Today’s Topics Why is machine learning? Example: learning to play backgammon General issues in machine learning 43 Importants Issues in Machine Learning Obtaining experience How to obtain experience? How many examples are enough? PAC learning theory Learning algorithms Supervised learning vs. Unsupervised learning What algorithm can approximate function well, when? How does the complexity of learning algorithms impact the learning accuracy? Whether the target function is learnable? Representing inputs How to represent the inputs? How to remove the irrelevant information from the input representation? How to reduce the redundancy of the input representation? 44