Outline

advertisement
Outline
• Y. LeCun, L. Bottou, Y. Bengio, and P.
Haffner, “Gradient-based learning applied to
document recognition,” Proceedings of the
IEEE, vol. 86, no. 11, pp. 2278-2324,
November, 1998.
Invariant Object Recognition
• The central goal of
computer vision
research is to detect
and recognize objects
invariant to scale,
viewpoint,
illumination, and other
changes
May 29, 2016
Computer Vision
2
(Invariant) Object Recognition
May 29, 2016
Computer Vision
3
Generalization Performance
• Many classifiers are available
– Maximum likelihood estimation, Bayesian estimation, Parzen
Windows, Kn-nearest neighbor, discriminant functions,
support vector machines, neural networks, decision trees, .......
– Which method is the best to classify unseen test
data?
• The performance is often determined by features
• In addition, we are interested in systems that can
solve a particular problem well
May 29, 2016
Computer Vision
4
Error Rate on Hand Written Digit Recognition
May 29, 2016
Computer Vision
5
No Free Lunch Theorem
May 29, 2016
Computer Vision
6
No Free Lunch Theorem – cont.
May 29, 2016
Computer Vision
7
Ugly Duckling Theorem
In the absence of prior information, there is no principled
reason to prefer one representation over another.
May 29, 2016
Computer Vision
8
Bias and Variance Dilemma
• Regression
– Find an estimate of a true but unknown function
F(x) based on n samples generated by F(x)
– Bias – the difference between the expected value
and the true value; a low bias means on average we
will accurately estimate F from D
– Variance – the variability of estimation; a low bias
means that the estimate does not change much as
the training set varies.
May 29, 2016
Computer Vision
9
Bias-Variance Dilemma
• When the training data is finite, there is an
intrinsic problem of any classifier function
– If the function is very generic, i.e., a non-parametric
family, it suffers from high variance
– If the function is very specific, i.e., a parametric
family, it suffers from high bias
– The central problem is to design a family of
classifiers a priori such that both the variance and
bias are low
May 29, 2016
Computer Vision
10
May 29, 2016
Computer Vision
11
Bias and Variance vs. Model Complexity
May 29, 2016
Computer Vision
13
Gap Between Training and Test Error
• Typically the performance of a classifier on a
disjoint test set will be larger than that on the
training set
– Where P is the number of training examples, h a
measure of capacity (model complexity), a
between 0.5 and 1, and k a constant
May 29, 2016
Computer Vision
14
Check Reading System
May 29, 2016
Computer Vision
15
End-to-End Training
May 29, 2016
Computer Vision
16
Graph Transformer Networks
May 29, 2016
Computer Vision
17
Training Using Gradient-Based Learning
• A multiple module system can be trained
using a gradient-based method
– Similar to backpropagation used for multiple
layer perceptrons
May 29, 2016
Computer Vision
18
Convolutional Networks
May 29, 2016
Computer Vision
20
Handwritten Digit Recognition Using a Convolutional Network
May 29, 2016
Computer Vision
21
Training a Convolutional Network
• The loss function used is
– Training algorithm is stochastic diagonal Levenberg-Marquardt
– RBF output is given by
May 29, 2016
Computer Vision
22
MNIST Dataset
• 60,000 training
images
• 10,000 test images
– There are several
different versions of
the dataset
May 29, 2016
Computer Vision
23
Experimental Results
May 29, 2016
Computer Vision
24
Experimental Results
May 29, 2016
Computer Vision
25
Distorted Patterns
• By using
distorted
patterns, the
training error
dropped to 0.8%
from 0.95%
without
deformation
May 29, 2016
Computer Vision
26
Misclassified Examples
May 29, 2016
Computer Vision
27
Comparison
May 29, 2016
Computer Vision
28
Rejection Performance
May 29, 2016
Computer Vision
29
Number of Operations
Unit: Thousand operations
May 29, 2016
Computer Vision
30
Memory Requirements
May 29, 2016
Computer Vision
31
Robustness
May 29, 2016
Computer Vision
32
Convolutional Network for Object Recognition
May 29, 2016
Computer Vision
33
NORB Dataset
May 29, 2016
Computer Vision
34
Convolutional Network for Object Recognition
May 29, 2016
Computer Vision
35
Experimental Results
May 29, 2016
Computer Vision
36
Jittered Cluttered Dataset
May 29, 2016
Computer Vision
37
Experimental Results
May 29, 2016
Computer Vision
38
Face Detection
May 29, 2016
Computer Vision
39
Face Detection
May 29, 2016
Computer Vision
40
Multiple Object Recognition
• Based on heuristic over segmentation
– It avoids making hard decisions about segmentation
by taking a large number of different segmentations
May 29, 2016
Computer Vision
41
Graph Transformer Network for Character Recognition
May 29, 2016
Computer Vision
42
Recognition Transformer and Interpretation Graph
May 29, 2016
Computer Vision
43
Viterbi Training
May 29, 2016
Computer Vision
44
Discriminative Viterbi Training
Discriminative Forward Training
May 29, 2016
Computer Vision
46
Space Displacement Neural Networks
• By considering all possible locations, one can
avoid explicit segmentation
– Similar to detection and recognition
May 29, 2016
Computer Vision
47
Space Displacement Neural Networks
• We can replicate convolutional networks at
all possible locations
May 29, 2016
Computer Vision
48
Space Displacement Neural Networks
May 29, 2016
Computer Vision
49
Space Displacement Neural Networks
May 29, 2016
Computer Vision
50
Space Displacement Neural Networks
May 29, 2016
Computer Vision
51
SDNN/HMM System
May 29, 2016
Computer Vision
52
Graph Transformer Networks and Transducers
May 29, 2016
Computer Vision
53
On-line Handwriting Recognition System
May 29, 2016
Computer Vision
54
On-line Handwriting Recognition System
May 29, 2016
Computer Vision
55
Comparative Results
May 29, 2016
Computer Vision
56
Check Reading System
May 29, 2016
Computer Vision
57
Confidence Estimation
May 29, 2016
Computer Vision
58
Summary
• By carefully designing systems with desired invariance
properties, one can often achieve better generalization
performance by limiting system’s capacity
• Multiple module systems can be trained often
effectively using gradient-based learning methods
– Even though in theory local gradient-based methods are
subject to local minima, in practice it seems it is not a
serious problem
– Incorporating contextual information into recognition
systems are often critical for real world applications
• End-to-end training is often more effective
May 29, 2016
Computer Vision
59
Download