Concept Learning - Binus Repository

advertisement
Materi Pendukung : T0264P25_1
Concept Learning
Li M. Fu
--------------------------------------------------------------------------------------------
Introduction





It is concerned with acquiring the definition of a general category
(concept) from a sample of positive and negative training examples
of the category.
It can be formulated as a problem of searching through a predefined
space of potential hypotheses for one that best fits the training
examples.
The search can be efficiently organized by general-to-specific
ordering of hypotheses.
Algorithms: Find-S and Version Space.
The main issue: inductive bias
Topics





A concept learning task
Concept learning as search
Find-S: Finding a maximally specific hypothesis
Version spaces
o The candidate elimination algorithm
o The boundary set representation
Inductive bias
Machine Learning Basics
Basics
Formal Definitions















Concept learning: c: X -> {0, 1}. What is T, E, P? Supervised or
unsupervised learning?
Training instance representation: (feature vector, positive/negative).
What is a feature vector? Given n binary attributes. how many
possible instances?
Hypothesis representation: feature vector. Admit don't care value (*
or ?). Same representation as instances, why? Given n binary
attributes. how many possible hypotheses?
Hypothesis function: h: X -> {0, 1}.
Instance space (X) versus hypothesis space (H)
Learning goal: Find a hypothesis h in H such that h(x) = c(x) for all
x in X.
Inductive learning hypothesis
Concept learning as search: Search for a hypothesis consistent with
the training instances. How to represent the hypothesis space for
allowing an efficient search?
General-to-specific ordering of hypotheses
A hypothesis viewed as a predicate defines a set of instances
satisfying the hypothesis
More_general_than_or_equal_to: hi >=g hj <=> for all x in X, hj(x)
= 1 => hi(x) = 1.
More_general_than: hi >g hj <=> hi >=g hj and hj not >=g hi.
More_specific_than: hi >s hj <=> hj >g hi.
Partial order versus complete order
Does >=g define a partial or complete order?
Generalization Operators




Introducing variables
Using property hierarchies (specific to general)
Dropping conditions
Introducing disjunctions
Specialization Operators




Instantiating variables with specific values
Using property hierarchies (general to specific)
Adding conditions
Introducing conjunctions
Inductive Inference (Induction)
Inductive Inference
FIND-S


The algorithm:
(1)Initialize h to the most specific hypothesis in H.
(2) For each positive instance x, if h is not satisfied by x, then
minimally generalize h so that it is matched by x; else do nothing.
(3) Output hypothesis h.
Issues:
(1) Convergence
(2) Finding the correct concept
(3) In favor of the most specific hypothesis
(4) Multiple MSHs
(5) Data inconsistency
(6) Backtracking ?
The Version Space Approach






Consistent(h, D) <=> for all (x, c(x)) in D, h(x) = c(x), where h is a
hypothesis and D is a set of training examples.
Satisfy(h, x) <=> h(x) = 1
Version Space VS_{H,D} = {h in H | Consistent(h, D)}
The List-Then-Elimination algorithm
The Candidate-Elimination algorithm
Boundary set representation of the version space
o The general boundary (wrt H and D): G <=> {g in H |
Consistent(g, D) and these exists NO g' such that [g' >g g and
Consistent(g', D)]} (where >g: more general than)
o
o
The specific boundary (wrt H and D): S <=> {s in H |
Consistent(s, D) and these exists NO s' such that [s >g s' and
Consistent(s', D)]}
Version space representation theorem: VS_{H,D} = {h in H |
there exist s in S and g in G such that (g >=g h >=g s)}
The Candidate Elimination Algorithm
Version Spaces
Version Space Issues:







Convergence? Divergence? Note that the size of the version space is
monotonically non-increasing over time.
Convergence to the correct hypothesis
What training instances should be selected?
How can a partially learned concept (a partially converged version
space) be used?
Incremental learning
Conjunctive versus disjunctive concepts
Noise tolerance
Inductive Bias




The inductive bias of a concept learning algorithm L is any minimal
set of assertions B such that
for all x in X, (B and D and x) => L(x, D)
where L(x, D): classification of x by L after training on data D.
Types of inductive bias:
o (Hypothesis)preference bias (or search bias)
o (Language)restriction bias
An unbiased learned: Size of the hypothesis space? Why futile?
Inductive bias of the Candidate Elimination algorithm:
B = {c in H}
A simple proof: c in H and thus c in VS_{H,D}, and therefore c(x) =
L(x, D).

Inductive bias from the weakest to strongest:
o Rote learning: no bias
o Candidate-Elimination: c in H
o Find-S: "c in H" plus "all instances are negative ones unless
opposed"
Summary






Concept learning can be cast as a search problem.
General-to-specific ordering of hypotheses provides a useful search
structure.
What are the limitations of the Find-S algorithm?
The version space approach is good for single concept learning.
What are the weaknesses of the version space approach?
What is the inductive bias of the candidate elimination algorithm?
Download