Outline • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, November, 1998. Invariant Object Recognition • The central goal of computer vision research is to detect and recognize objects invariant to scale, viewpoint, illumination, and other changes May 29, 2016 Computer Vision 2 (Invariant) Object Recognition May 29, 2016 Computer Vision 3 Generalization Performance • Many classifiers are available – Maximum likelihood estimation, Bayesian estimation, Parzen Windows, Kn-nearest neighbor, discriminant functions, support vector machines, neural networks, decision trees, ....... – Which method is the best to classify unseen test data? • The performance is often determined by features • In addition, we are interested in systems that can solve a particular problem well May 29, 2016 Computer Vision 4 Error Rate on Hand Written Digit Recognition May 29, 2016 Computer Vision 5 No Free Lunch Theorem May 29, 2016 Computer Vision 6 No Free Lunch Theorem – cont. May 29, 2016 Computer Vision 7 Ugly Duckling Theorem In the absence of prior information, there is no principled reason to prefer one representation over another. May 29, 2016 Computer Vision 8 Bias and Variance Dilemma • Regression – Find an estimate of a true but unknown function F(x) based on n samples generated by F(x) – Bias – the difference between the expected value and the true value; a low bias means on average we will accurately estimate F from D – Variance – the variability of estimation; a low bias means that the estimate does not change much as the training set varies. May 29, 2016 Computer Vision 9 Bias-Variance Dilemma • When the training data is finite, there is an intrinsic problem of any classifier function – If the function is very generic, i.e., a non-parametric family, it suffers from high variance – If the function is very specific, i.e., a parametric family, it suffers from high bias – The central problem is to design a family of classifiers a priori such that both the variance and bias are low May 29, 2016 Computer Vision 10 May 29, 2016 Computer Vision 11 Bias and Variance vs. Model Complexity May 29, 2016 Computer Vision 13 Gap Between Training and Test Error • Typically the performance of a classifier on a disjoint test set will be larger than that on the training set – Where P is the number of training examples, h a measure of capacity (model complexity), a between 0.5 and 1, and k a constant May 29, 2016 Computer Vision 14 Check Reading System May 29, 2016 Computer Vision 15 End-to-End Training May 29, 2016 Computer Vision 16 Graph Transformer Networks May 29, 2016 Computer Vision 17 Training Using Gradient-Based Learning • A multiple module system can be trained using a gradient-based method – Similar to backpropagation used for multiple layer perceptrons May 29, 2016 Computer Vision 18 Convolutional Networks May 29, 2016 Computer Vision 20 Handwritten Digit Recognition Using a Convolutional Network May 29, 2016 Computer Vision 21 Training a Convolutional Network • The loss function used is – Training algorithm is stochastic diagonal Levenberg-Marquardt – RBF output is given by May 29, 2016 Computer Vision 22 MNIST Dataset • 60,000 training images • 10,000 test images – There are several different versions of the dataset May 29, 2016 Computer Vision 23 Experimental Results May 29, 2016 Computer Vision 24 Experimental Results May 29, 2016 Computer Vision 25 Distorted Patterns • By using distorted patterns, the training error dropped to 0.8% from 0.95% without deformation May 29, 2016 Computer Vision 26 Misclassified Examples May 29, 2016 Computer Vision 27 Comparison May 29, 2016 Computer Vision 28 Rejection Performance May 29, 2016 Computer Vision 29 Number of Operations Unit: Thousand operations May 29, 2016 Computer Vision 30 Memory Requirements May 29, 2016 Computer Vision 31 Robustness May 29, 2016 Computer Vision 32 Convolutional Network for Object Recognition May 29, 2016 Computer Vision 33 NORB Dataset May 29, 2016 Computer Vision 34 Convolutional Network for Object Recognition May 29, 2016 Computer Vision 35 Experimental Results May 29, 2016 Computer Vision 36 Jittered Cluttered Dataset May 29, 2016 Computer Vision 37 Experimental Results May 29, 2016 Computer Vision 38 Face Detection May 29, 2016 Computer Vision 39 Face Detection May 29, 2016 Computer Vision 40 Multiple Object Recognition • Based on heuristic over segmentation – It avoids making hard decisions about segmentation by taking a large number of different segmentations May 29, 2016 Computer Vision 41 Graph Transformer Network for Character Recognition May 29, 2016 Computer Vision 42 Recognition Transformer and Interpretation Graph May 29, 2016 Computer Vision 43 Viterbi Training May 29, 2016 Computer Vision 44 Discriminative Viterbi Training Discriminative Forward Training May 29, 2016 Computer Vision 46 Space Displacement Neural Networks • By considering all possible locations, one can avoid explicit segmentation – Similar to detection and recognition May 29, 2016 Computer Vision 47 Space Displacement Neural Networks • We can replicate convolutional networks at all possible locations May 29, 2016 Computer Vision 48 Space Displacement Neural Networks May 29, 2016 Computer Vision 49 Space Displacement Neural Networks May 29, 2016 Computer Vision 50 Space Displacement Neural Networks May 29, 2016 Computer Vision 51 SDNN/HMM System May 29, 2016 Computer Vision 52 Graph Transformer Networks and Transducers May 29, 2016 Computer Vision 53 On-line Handwriting Recognition System May 29, 2016 Computer Vision 54 On-line Handwriting Recognition System May 29, 2016 Computer Vision 55 Comparative Results May 29, 2016 Computer Vision 56 Check Reading System May 29, 2016 Computer Vision 57 Confidence Estimation May 29, 2016 Computer Vision 58 Summary • By carefully designing systems with desired invariance properties, one can often achieve better generalization performance by limiting system’s capacity • Multiple module systems can be trained often effectively using gradient-based learning methods – Even though in theory local gradient-based methods are subject to local minima, in practice it seems it is not a serious problem – Incorporating contextual information into recognition systems are often critical for real world applications • End-to-end training is often more effective May 29, 2016 Computer Vision 59