Machine Learning - Northwestern University

Machine Learning Topic 10: Computational Learning Theory Bryan Pardo, Machine Learning: EECS 349 Fall 2009 Overview • Are there general laws that govern learning? – Sample Complexity: How many training examples are needed to learn a successful hypothesis? – Computational Complexity: How much computational effort is needed to learn a successful hypothesis? – Mistake Bound: How many training examples will the learner misclassify before converging to a successful hypothesis? Bryan Pardo, Machine Learning: EECS 349 Fall 2009 Definition • The true error of hypothesis h, with respect to the target concept c and observation distribution D is the probability that h will misclassify an instance drawn according to D errorD  P [c( x)  h( x)] xD • In a perfect world, we’d like the true error to be 0 Bryan Pardo, Machine Learning: EECS 349 Fall 2009 The world isn’t perfect • If we can’t provide every instance for training, a consistent hypothesis may have error on unobserved instances. Instance Space X Hypothesis h Training set Concept c • How many training examples do we need to bound the likelihood of error to a reasonable level? When is our hypothesis Probably Approximately Correct (PAC)? Bryan Pardo, Machine Learning: EECS 349 Fall 2009 Some terms X is the set of all possible instances C is the set of all possible concepts c where c : X  {0,1} H is the set of hypotheses considered by a learner, H  C L is the learner D is a probability distribution over X that generates observed instances Bryan Pardo, Machine Learning: EECS 349 Fall 2009 Definition: e-exhausted IN ENGLISH: The set of hypotheses consistent with the training data T is e-exhausted if, when you test them on the actual distribution of instances, all consistent hypotheses have error below e IN MATHESE: VS H,T is e - exhausted for concept c and sample distribution D, if....  h VS H,T , errorD (h)  e Bryan Pardo, Machine Learning: EECS 349 Fall 2009 A Theorem If hypothesisspace H is finite, & training set T contains m independent randomly drawn examples of concept c THEN, for any 0  e  1... P(VS H,T is NOT ε - exhausted)  |H|e Bryan Pardo, Machine Learning: EECS 349 Fall 2009 em Proof of Theorem If hypothesish has true error e , the probability of it getting a single random exampe right is : P(h got 1 example right)  1-ε Ergo the probability of h getting m examples right is : P(h got m examples right)  (1-ε ) m Bryan Pardo, Machine Learning: EECS 349 Fall 2009 Proof of Theorem If there are k hypothesesin H with error at least e , call the probability at least of those k hypotheses got m instances right P(at least one bad h looks good ). This prob. is BOUNDED by k (1-ε ) m Pat least one bad h looks good   k (1-ε ) m Think about why this is a bound and not an equality Bryan Pardo, Machine Learning: EECS 349 Fall 2009 Proof of Theorem (continued) Since k  H , it follows that k (1-ε)  H (1-ε) m If 0  e  1, then (1  e )  e m e Therefore... P(at least one bad h looks good )  k (1-ε ) m  H (1-ε ) m  H e em Proof complete! We now have a bound on the likelihood that a hypothsesis consistent with the training data will have error  e Bryan Pardo, Machine Learning: EECS 349 Fall 2009 Using the theorem Let' s rearrange to see  to set a bound  on  ln  H   ln e em ln  H   em  ln   ln  H   ln    em the likelihood our true error is e .    ln   ln H e em  ln   how many training examples we need H e  em   1 e ln  H   ln    m 1  1   ln  H   ln     m e    Bryan Pardo, Machine Learning: EECS 349 Fall 2009 Probably Approximately Correct (PAC) 1 e The worst error we’ll tolerate ln  H   ln    m hypothesis space size The likelihood a hypothesis consistent with the training data will have error e number of training examples Bryan Pardo, Machine Learning: EECS 349 Fall 2009 Using the bound 1 e ln  H   ln    m Plug in e,  , and H to get a number of training examples m that will “guarantee” your learner will generate a hypothesis that is Probably Approximately Correct. NOTE: This assumes that the concept is actually IN H, that H is finite, and that your training set is drawn using distribution D Bryan Pardo, Machine Learning: EECS 349 Fall 2009 Problems with PAC • The PAC Learning framework has 2 disadvantages: 1) It can lead to weak bounds 2)Sample Complexity bound cannot be established for infinite hypothesis spaces • We introduce the VC dimension for dealing with these problems Bryan Pardo, Machine Learning: EECS 349 Fall 2009 Shattering Def: A set of instances S is shattered by hypothesis set H iff for every possible concept c on S there exists a hypothesis h in H that is consistent with that concept. Bryan Pardo, Machine Learning: EECS 349 Fall 2009 Can a linear separator shatter this? NO! The ability of H to shatter a set of instances is a measure of its capacity to represent target concepts defined over those instances Bryan Pardo, Machine Learning: EECS 349 Fall 2009 Can a quadratic separator shatter this? This sounds like a homework problem…. Bryan Pardo, Machine Learning: EECS 349 Fall 2009 Vapnik-Chervonenkis Dimension Def: The Vapnik-Chervonenkis dimension, VC(H) of hypothesis space H defined over instance space X is the size of the largest finite subset of X shattered by H. If arbitrarily large finite sets can be shattered by H, then VC(H) is infinite. Bryan Pardo, Machine Learning: EECS 349 Fall 2009 How many training examples needed? • Upper bound on m using VC(H) m 1 e This uses the HYPOTHESIS space (4 log 2 (2 /  )  8VC ( H ) log 2 (13 / e )) • Lower bound on m using VC(C) This uses the CONCEPT space VC (C )  1 1 max  log(1 /  ),  e 32 e   There’s more on this in the textbook. Bryan Pardo, Machine Learning: EECS 349 Fall 2009 Mistake bound model of learning • How many mistakes (misclassified examples) will a learner make before it converges to the correct hypothesis? No time for this in lecture. Read the book! Northwestern University Winter 2007 Machine Learning EECS 395-22

Machine Learning - Northwestern University

Related documents

Products

Support

Machine Learning - Northwestern University

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib