Deep Learning - University of Houston

advertisement
Kernel Analysis of
Deep Networks
By:
Gregoire Montavon
Mikio L. Braun
Klaus-Robert Muller
(Technical University of Berlin)
JMLR 2011
Presented by:
Behrang mehrparvar
(University of Houston)
April 8th, 2014
Roadmap

Deep Learning

Goodness of Representations

Measuring goodness

Role of architecture
Deep Learning?


Distributed representation

Less examples in regions

Capture global structure
Depth


Abstraction


Efficient representation
Higher-level features
Flexibility

Incorporate prior knowledge
Distributed Representation [1]
Depth [2]
Abstraction [?]
Problem Specification

Deep Learning is still a Black Box!

Theoretical aspect


Analytical arguments


e.g. analysis of depth
Experimental results


e.g. studying depth in sum-product networks
e.g. performance in application domains
Visualization

e.g. measuring invariance
Kernel Methods

Decouples learning algorithms from data representation

Kernel operator:


Measures similarity between points

All the prior knowledge of the learning problem
In this paper:

Not a learning machine

Abstraction tool to model the deep network
Kernel Methods (cont.)

Kernel Methods


model the deep network
Used to quantify ...

the goodness of representations

the evolution of good representations
Hypothesis
1) Simpler and more accurate representation throughout the depth
2) Structure of the network (restrictions) define the speed of how
representations are formed
–
Evolution from dist. of pixels to dist. of classes
Problem Specification

Problem: Role of depth in goodness of representation

Challenge: Definition and Measurement for goodness

Solution:
–
Simplicity


–
Dimensionality: number of kernel PCs
Number of local variations
Accuracy

Classification error
Hypothesis (Cont.)
Method
1) Train the deep network
2) Infer the representation of each layer
3) Apply kernel PCA on each layer representations
4) Project data points on first d eigenvectors
5) Analyze the results
Method (Analysis)
Why Kernels?
1) Incorporating prior knowledge
2) Measurable simplicity and accuracy
3) Theoretical framework and convergence bounds [3]
4) Flexibility
Dimensionality and Complexity
Dimensionality and Complexity (cont.)
Intuition

Accuracy
–

Task-relevant information
Simplicity
–
Number of allowed local variations in the inputs
space
–
However, does not explain domain-specific
regularities
–
Robust to number of samples
•
vs. number of support vectors
Effects of Kernel mapping
Experiment setup



Datasets
–
MNIST
–
CIFAR
Tasks
–
Supervised learning
–
Transfer learning
Architectures
–
Multilayer perceptron (MLP)
–
Pretrained multilayer perceptron (PMLP)
–
Convolutional neural networks (CNN)
Effect of Settings
Effect of Depth (Hyp. 1)
Observation

Higher layers
–
More accurate representations
–
More simple representations
Architectures



Multilayer Perceptrons
–
No preconditioning on learning problem
–
Prior: NONE
Pretrained Multilayer perceptrons
–
Better represents the underlying representation
–
Contains a certain part of soluton
–
Prior: generative model of input
Convolutional Neural Networks
–
Prior: Spatial invariance
Multilayer Perceptron [4]
Convolutional Neural Networks [4]
Effect of Architecture (Hyp. 2)
Observation


MNIST:
–
MLP: Discriminating is solved greedily
–
PMLP and CNN: postpone to last layers
CIFAR
–
MLP: Doesn't discriminate till last layer
–
PMLP and CNN: spread it to more layers
WHY?!
–
Good observation, but no explanation!
–
Hints: dataset, priors, etc. ?
Effect of Architecture (Cont.)
Observation

Regularities in PMLP and CNN
–
Facilitate the construction of a structured
solution
–
Controls the rate of discrimination at every level
Label Contribution of PCs
Comments


Strengths
–
Important and interesting problem
–
Simple and intuitive approach
–
Well designed experiments
–
Good analysis of results
Weaknesses
–
Too many observations
•
–
e.g. role of sigma in scale invariance
explaining observations
Future works?

Experiments on Unsupervised Learning

Explaining the results

Analysis on biological neural systems?!
References
1) Bengio, Yoshua, and Olivier Delalleau. "On the expressive
power of deep architectures." Algorithmic Learning Theory.
Springer Berlin Heidelberg, 2011.
2) Poon, Hoifung, and Pedro Domingos. "Sum-product networks: A
new deep architecture." Computer Vision Workshops (ICCV
Workshops), 2011 IEEE International Conference on. IEEE, 2011.
3) Braun, Mikio L., Joachim M. Buhmann, and Klaus-Robert Müller. "On
relevant dimensions in kernel feature spaces." The Journal of
Machine Learning Research 9 (2008): 1875-1908.
4) http://deeplearning.net/
Thanks ...
Download