Materi Pendukung : T0264P25_1 Concept Learning Li M. Fu -------------------------------------------------------------------------------------------- Introduction It is concerned with acquiring the definition of a general category (concept) from a sample of positive and negative training examples of the category. It can be formulated as a problem of searching through a predefined space of potential hypotheses for one that best fits the training examples. The search can be efficiently organized by general-to-specific ordering of hypotheses. Algorithms: Find-S and Version Space. The main issue: inductive bias Topics A concept learning task Concept learning as search Find-S: Finding a maximally specific hypothesis Version spaces o The candidate elimination algorithm o The boundary set representation Inductive bias Machine Learning Basics Basics Formal Definitions Concept learning: c: X -> {0, 1}. What is T, E, P? Supervised or unsupervised learning? Training instance representation: (feature vector, positive/negative). What is a feature vector? Given n binary attributes. how many possible instances? Hypothesis representation: feature vector. Admit don't care value (* or ?). Same representation as instances, why? Given n binary attributes. how many possible hypotheses? Hypothesis function: h: X -> {0, 1}. Instance space (X) versus hypothesis space (H) Learning goal: Find a hypothesis h in H such that h(x) = c(x) for all x in X. Inductive learning hypothesis Concept learning as search: Search for a hypothesis consistent with the training instances. How to represent the hypothesis space for allowing an efficient search? General-to-specific ordering of hypotheses A hypothesis viewed as a predicate defines a set of instances satisfying the hypothesis More_general_than_or_equal_to: hi >=g hj <=> for all x in X, hj(x) = 1 => hi(x) = 1. More_general_than: hi >g hj <=> hi >=g hj and hj not >=g hi. More_specific_than: hi >s hj <=> hj >g hi. Partial order versus complete order Does >=g define a partial or complete order? Generalization Operators Introducing variables Using property hierarchies (specific to general) Dropping conditions Introducing disjunctions Specialization Operators Instantiating variables with specific values Using property hierarchies (general to specific) Adding conditions Introducing conjunctions Inductive Inference (Induction) Inductive Inference FIND-S The algorithm: (1)Initialize h to the most specific hypothesis in H. (2) For each positive instance x, if h is not satisfied by x, then minimally generalize h so that it is matched by x; else do nothing. (3) Output hypothesis h. Issues: (1) Convergence (2) Finding the correct concept (3) In favor of the most specific hypothesis (4) Multiple MSHs (5) Data inconsistency (6) Backtracking ? The Version Space Approach Consistent(h, D) <=> for all (x, c(x)) in D, h(x) = c(x), where h is a hypothesis and D is a set of training examples. Satisfy(h, x) <=> h(x) = 1 Version Space VS_{H,D} = {h in H | Consistent(h, D)} The List-Then-Elimination algorithm The Candidate-Elimination algorithm Boundary set representation of the version space o The general boundary (wrt H and D): G <=> {g in H | Consistent(g, D) and these exists NO g' such that [g' >g g and Consistent(g', D)]} (where >g: more general than) o o The specific boundary (wrt H and D): S <=> {s in H | Consistent(s, D) and these exists NO s' such that [s >g s' and Consistent(s', D)]} Version space representation theorem: VS_{H,D} = {h in H | there exist s in S and g in G such that (g >=g h >=g s)} The Candidate Elimination Algorithm Version Spaces Version Space Issues: Convergence? Divergence? Note that the size of the version space is monotonically non-increasing over time. Convergence to the correct hypothesis What training instances should be selected? How can a partially learned concept (a partially converged version space) be used? Incremental learning Conjunctive versus disjunctive concepts Noise tolerance Inductive Bias The inductive bias of a concept learning algorithm L is any minimal set of assertions B such that for all x in X, (B and D and x) => L(x, D) where L(x, D): classification of x by L after training on data D. Types of inductive bias: o (Hypothesis)preference bias (or search bias) o (Language)restriction bias An unbiased learned: Size of the hypothesis space? Why futile? Inductive bias of the Candidate Elimination algorithm: B = {c in H} A simple proof: c in H and thus c in VS_{H,D}, and therefore c(x) = L(x, D). Inductive bias from the weakest to strongest: o Rote learning: no bias o Candidate-Elimination: c in H o Find-S: "c in H" plus "all instances are negative ones unless opposed" Summary Concept learning can be cast as a search problem. General-to-specific ordering of hypotheses provides a useful search structure. What are the limitations of the Find-S algorithm? The version space approach is good for single concept learning. What are the weaknesses of the version space approach? What is the inductive bias of the candidate elimination algorithm?