CS 416 Artificial Intelligence Lecture 24 Statistical Learning

CS 416 Artificial Intelligence Lecture 24 Statistical Learning Chapter 20 AI: Creating rational agents The pursuit of autonomous, rational, agents • It’s all about search – Varying amounts of model information  tree searching (informed/uninformed)  simulated annealing  value/policy iteration – Searching for an explanation of observations  Used to develop a model Searching for explanation of observations If I can explain observations… can I predict the future? • Can I explain why ten coin tosses are 6 H and 4 T? – Can I predict the 11th coin toss Running example: Candy Surprise Candy • Comes in two flavors – cherry (yum) – lime (yuk) • All candy is wrapped in same opaque wrapper • Candy is packaged in large bags containing five different allocations of cherry and lime Statistics Given a bag of candy, what distribution of flavors will it have? • Let H be the random variable corresponding to your hypothesis – H1 = all cherry, H2 = all lime, H3 = 50/50 cherry/lime • As you open pieces of candy, let each observation of data: D1, D2, D3, … be either cherry or lime – D1 = cherry, D2 = cherry, D3 = lime, … • Predict the flavor of the next piece of candy – If the data caused you to believe H1 was correct, you’d pick cherry Bayesian Learning Use available data to calculate the probability of each hypothesis and make a prediction • Because each hypothesis has an independent likelihood, we use all their relative likelihoods when making a prediction • Probabilistic inference using Bayes’ rule: – P(hi | d) = aP(d | hi) P(hi) likelihood hypothesis prior – The probability of of hypothesis hi being active given you observed sequence d equals the probability of seeing data sequence d generated by hypothesis hi multiplied by the likelihood of hypothesis i being active Prediction of an unknown quantity X • The likelihood of X happening given d has already happened is a function of how much each hypothesis predicts X can happen given d has happened – Even though a hypothesis has a high prediction that X will happen, this prediction will be discounted if the hypothesis itself is unlikely to be true given the observation of d Details of Bayes’ rule • All observations within d are – independent – identically distributed • The probability of a hypothesis explaining a series of observations, d – is the product of explaining each component Example Prior distribution across hypotheses – h1 = 100% cherry = 0.1 – h2 = 75/25 cherry/lime = 0.2 – h3 = 50/50 cherry/lime = 0.5 – h4 = 25/75 cherry/lime = 0.2 – h5 = 100% lime = 0.1 Prediction • P(d|h3) = (0.5)10 Example Probabilities for each hypothesis starts at prior value <.1, .2, .4, .2, .1> Probability of h3 hypothesis as 10 lime candies are observed • P(d|h3)*P(h3) = (0.5)10*(0.4) Prediction of 11th candy If we’ve observed 10 lime candies, is 11th lime? • Build weighted sum of each hypothesis’s prediction from hypothesis • Weighted sum can become expensive to compute from observations – Instead use most probable hypothesis and ignore others – MAP: maximum a posteriori Overfitting Remember overfitting from NN discussion? The number of hypotheses influences predictions • Too many hypotheses can lead to overfitting Overfitting Example Say we’ve observed 3 cherry and 7 lime • Consider our 5 hypotheses from before – prediction is a weighted average of the 5 • Consider having 11 hypotheses, one for each permutation – The 3/7 hypothesis will be 1 and all others will be 0 Learning with Data First talk about parameter learning • Let’s create a hypothesis for candies that says the probability a cherry is drawn is q, hq – If we unwrap N candies and c are cherry, what is q? – The (log) likelihood is: Learning with Data We want to find q that maximizes log-likelihood • differentiate L with respect to q and set to 0 • This solution process may not be easily computed and iterative and numerical methods may be used

CS 416 Artificial Intelligence Lecture 24 Statistical Learning

Related documents

Products

Support

CS 416 Artificial Intelligence Lecture 24 Statistical Learning

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib