CSCI498B/598B Human-Centered Robotics September 29, 2014 A quick summary • Skeleton-based human representations • Robot learning from data • Decision Tree • Naïve Bayesian Network • Support Vector Machine (SVM) • Many others… 2 • Robot learning from data • Decision Tree • Naïve Bayesian Network • Support Vector Machine (SVM) • Many others… Given a range of learning algorithms, which is the best? 3 Which learning method is the best? • Suppose we make no prior assumptions about the nature of the classification task. Can we expect any learning method to be superior or inferior overall? 4 Which learning method is the best? • Suppose we make no prior assumptions about the nature of the classification task. Can we expect any learning method to be superior or inferior overall? • No Free Lunch Theorem: Answer to above question: NO 5 No Free Lunch Theorem • Suppose we make no prior assumptions about the nature of the classification task. Can we expect any learning method to be superior or inferior overall? • No Free Lunch Theorem: Answer to above question: NO • If the goal is to obtain good generalization performance, there is no context-independent or usage-independent reasons to favor one algorithm over others 6 No Free Lunch Theorem • Suppose we make no prior assumptions about the nature of the classification task. Can we expect any learning method to be superior or inferior overall? • No Free Lunch Theorem: Answer to above question: NO • If one algorithm seems to outperform another in a particular situation, it is a consequence of its fit to a particular pattern recognition problem 7 No Free Lunch Theorem • Even popular algorithms will perform poorly on some problems, where the learning algorithm and data distribution do not match well • In practice, experience with a broad range of techniques is the best insurance for solving arbitrary new classification problems “Essentially, all models are wrong, but some are useful.” -- Box, George E. P. 8 Guidelines to choose a learner • For a new classification problem, what matters most: prior information, data distribution, size of training set, cost ... • Some algorithms may be preferred because of their low complexity, ability to incorporate prior knowledge, …. • Principle of Occam’s Razor: given two learners that perform equally well on the training set, it is asserted that the simpler learner may do better on test set 9 A quick summary • Robot learning from data • Decision Tree • Naïve Bayesian Network • Support Vector Machine (SVM) • Many others… • Ensemble methods 10 Ensemble-based Systems in Decision Making • For many tasks, we often seek second opinion before making a decision, sometimes many more • Consulting different doctors before a major surgery • Reading reviews before buying a product • Requesting references before hiring someone • We consider decisions of multiple experts in our daily lives 11 Ensemble-based Systems in Decision Making • Why not follow the same strategy in robot decision making? • Multiple learner systems, committee of classifiers, mixture of experts, ensemble based systems 12 Why Ensemble-based Systems? • Statistical reasons • A set of classifiers with similar training performances may have different generalization performances • Combining outputs of several classifiers reduces the risk of selecting a poorly performing classifier • Large volumes of data • If the amount of data to be analyzed is too large, a single classifier may not be able to handle it; train different classifiers on different partitions of data • Too little data • Ensemble systems can also be used when there is too little data; resampling techniques • Data Fusion • Given several sets of data from various sources, where the nature of features is different (heterogeneous features), training a single classifier may not be appropriate (e.g., vision, sound, laser,..) 13 Why Ensemble-based Systems? • Divide and Conquer: Divide data space into smaller & easier-to-learn partitions; each classifier learns only one of the simpler partitions 14 Ensemble-based Systems • Ensemble based systems provide favorable results compared to single-expert systems for a broad range of applications & under a variety of scenarios • All ensemble systems have two key components: • Generate component classifiers of the ensemble • Method for combining the classifier outputs 15 Ensemble-based Systems • Intuition of combining learners: if each classifier makes different errors, then their strategic combination can reduce the total error! • Need base classifiers whose decision boundaries are adequately different from those of others • Such a set of classifiers is said to be diverse • Popular ensemble based algorithms • Bagging • Boosting (AdaBoost) • Mixture of experts 16 Ensemble-based Systems • Bagging is the way decrease the variance of your prediction by manipulating data for training from your original dataset • Boosting is a an approach to calculate the output using several different models and then average the result using a weighted average approach. • By combining the advantages and pitfalls of these approaches by varying your weighting formula you can come up with a good predictive force for a wider range of input data, using different narrowly tuned models. 17 Bagging • Bagging, short for bootstrap aggregating, is one of the earliest ensemble based algorithms • It is one of the most intuitive and simplest to implement, with surprisingly good performance • How to achieve classifier diversity? • Use different training sets to train individual classifiers • How to obtain different training sets? • Resampling techniques: training subsets are drawn randomly from the entire training set 18 Bagging - Sampling with Replacement • Random & overlapping training sets to train three classifiers; they are combined to obtain a more accurate classification 19 Bagging - Sampling without Replacement • k-fold data split 20 Bagging • Method for combining the learner outputs 21 A TED talk from Marco Tempest Marco Tempest is a Swiss magician based in New York City. He is known for his multimedia magic and use of interactive technology and computer 22 graphics in his illusions and presentations.