CSCI498B/598B Human-Centered Robotics September 29, 2014

advertisement
CSCI498B/598B
Human-Centered Robotics
September 29, 2014
A quick summary
• Skeleton-based human representations
• Robot learning from data
• Decision Tree
• Naïve Bayesian Network
• Support Vector Machine (SVM)
• Many others…
2
• Robot learning from data
• Decision Tree
• Naïve Bayesian Network
• Support Vector Machine (SVM)
• Many others…
Given a range of learning algorithms,
which is the best?
3
Which learning method is the best?
• Suppose we make no prior assumptions about the
nature of the classification task. Can we expect any
learning method to be superior or inferior overall?
4
Which learning method is the best?
• Suppose we make no prior assumptions about the
nature of the classification task. Can we expect any
learning method to be superior or inferior overall?
• No Free Lunch Theorem: Answer to above
question: NO
5
No Free Lunch Theorem
• Suppose we make no prior assumptions about the
nature of the classification task. Can we expect any
learning method to be superior or inferior overall?
• No Free Lunch Theorem: Answer to above
question: NO
• If the goal is to obtain good generalization
performance, there is no context-independent or
usage-independent reasons to favor one algorithm
over others
6
No Free Lunch Theorem
• Suppose we make no prior assumptions about the
nature of the classification task. Can we expect any
learning method to be superior or inferior overall?
• No Free Lunch Theorem: Answer to above
question: NO
• If one algorithm seems to outperform another in a
particular situation, it is a consequence of its fit to
a particular pattern recognition problem
7
No Free Lunch Theorem
• Even popular algorithms will perform poorly on
some problems, where the learning algorithm and
data distribution do not match well
• In practice, experience with a broad range of
techniques is the best insurance for solving
arbitrary new classification problems
“Essentially, all models are wrong, but
some are useful.”
-- Box, George E. P.
8
Guidelines to choose a learner
• For a new classification problem, what matters
most: prior information, data distribution, size of
training set, cost ...
• Some algorithms may be preferred because of their
low complexity, ability to incorporate prior
knowledge, ….
• Principle of Occam’s Razor: given two learners that
perform equally well on the training set, it is
asserted that the simpler learner may do better on
test set
9
A quick summary
• Robot learning from data
• Decision Tree
• Naïve Bayesian Network
• Support Vector Machine (SVM)
• Many others…
• Ensemble methods
10
Ensemble-based Systems in Decision Making
• For many tasks, we often seek second opinion
before making a decision, sometimes many more
• Consulting different doctors before a major surgery
• Reading reviews before buying a product
• Requesting references before hiring someone
• We consider decisions of multiple experts in our
daily lives
11
Ensemble-based Systems in Decision Making
• Why not follow the same strategy in robot
decision making?
• Multiple learner systems, committee of
classifiers, mixture of experts, ensemble based
systems
12
Why Ensemble-based Systems?
• Statistical reasons
• A set of classifiers with similar training performances
may have different generalization performances
• Combining outputs of several classifiers reduces the
risk of selecting a poorly performing classifier
• Large volumes of data
• If the amount of data to be analyzed is too large, a
single classifier may not be able to handle it; train
different classifiers on different partitions of data
• Too little data
• Ensemble systems can also be used when there is too
little data; resampling techniques
• Data Fusion
• Given several sets of data from various sources, where
the nature of features is different (heterogeneous
features), training a single classifier may not be
appropriate (e.g., vision, sound, laser,..)
13
Why Ensemble-based Systems?
• Divide and Conquer: Divide data space into smaller &
easier-to-learn partitions; each classifier learns only one
of the simpler partitions
14
Ensemble-based Systems
• Ensemble based systems provide favorable results
compared to single-expert systems for a broad range
of applications & under a variety of scenarios
• All ensemble systems have two key components:
• Generate component classifiers of the ensemble
• Method for combining the classifier outputs
15
Ensemble-based Systems
• Intuition of combining learners: if each classifier
makes different errors, then their strategic
combination can reduce the total error!
• Need base classifiers whose decision boundaries are
adequately different from those of others
• Such a set of classifiers is said to be diverse
• Popular ensemble based algorithms
• Bagging
• Boosting (AdaBoost)
• Mixture of experts
16
Ensemble-based Systems
• Bagging is the way decrease the variance of your prediction
by manipulating data for training from your original dataset
• Boosting is a an approach to calculate the output using
several different models and then average the result using a
weighted average approach.
• By combining the advantages and pitfalls of these approaches by
varying your weighting formula you can come up with a good
predictive force for a wider range of input data, using different
narrowly tuned models.
17
Bagging
• Bagging, short for bootstrap aggregating, is one
of the earliest ensemble based algorithms
• It is one of the most intuitive and simplest to
implement, with surprisingly good performance
• How to achieve classifier diversity?
• Use different training sets to train individual classifiers
• How to obtain different training sets?
• Resampling techniques: training subsets are drawn
randomly from the entire training set
18
Bagging - Sampling with Replacement
• Random & overlapping training sets to train three classifiers;
they are combined to obtain a more accurate classification
19
Bagging - Sampling without Replacement
• k-fold data split
20
Bagging
• Method for combining the learner outputs
21
A TED talk from Marco Tempest
Marco Tempest is a Swiss magician based in New York City. He is known for
his multimedia magic and use of interactive technology and computer
22
graphics in his illusions and presentations.
Download