CS290D - Presentation.v3

advertisement
Building high-level features using
large-scale unsupervised learning
Anh Nguyen, Bay-yuan Hsu
CS290D – Data Mining (Spring 2014)
University of California, Santa Barbara
Slide adapted from Andrew Ng (Stanford), Nando de Freitas (UBC)
1
Agenda
1. Motivation
2. Approach
1.
2.
3.
4.
5.
Sparse Deep Auto-encoder
Local Receptive Field
L2 Pooling
Local contrast normalization
Overall Model
3. Parallelism
4. Evaluation
5. Discussion
2
1. MOTIVATION
3
Motivation
• Feature learning
• Supervised learning
• Need large number of labeled data
• Unsupervised learning
• Example: Build face detector without having labeled
face images
• Building high-level features using unlabeled data.
4
Motivation
• Previous works
• Auto encoder
• Sparse coding
• Result: Only learns low level features
• Reason: Computational constraints
• Approach
• Dataset
• Model
• Computational resources
5
2. APPROACH
6
Sparse Deep Auto-encoder
• Auto-encoder
• Neural network
• Unsupervised learning
• Back-propagation
7
Sparse Deep Auto-encoder (cnt’d)
• Sparse Coding
• Input: Images x(1), x(2) ... x(m)
• Learn: Bases (features) f1, f2, ..., fk, so that each
input x can be approximately decomposed as:
x=∑ajfj s.t. aj’s are mostly zero (“sparse”)
8
Sparse Deep Auto-encoder (cnt’d)
9
Sparse Deep Auto-encoder (cnt’d)
• Sparse Coding
• Regularizer
10
Sparse Deep Auto-encoder (cnt’d)
• Sparse Deep Auto-encoder
• Multiple hidden layers to achieve particular
characteristic in learning features
11
Local Receptive Field
• Definition: Each feature in
the autoencoder can
connect only to a small
region of the lower layer
• Goal:
• Learn feature efficiently
• Parallelism
• Training on small image
patches
12
L2 Pooling
• Goal: Robust to local distortion
• Approach: Group similar features together to
achieve invariance
13
L2 Pooling
• Goal: Robust to local distortion
• Approach: Group similar features together to
achieve invariance
14
L2 Pooling
• Goal: Robust to local distortion
• Approach: Group similar features together to
achieve invariance
15
L2 Pooling
• Goal: Robust to local distortion
• Approach: Group similar features together to
achieve invariance
16
Local Contrast Normalization
• Goal: Robust to variation in light intensity
• Approach: Normalize contrast
17
Local Contrast Normalization
• Goal: Robust to variation in light intensity
• Approach: Normalize contrast
18
Overall Model
• 3 layers
• Simple: 18x18 px
• 8 neurons/patch
• Complex: 5x5 px
• LCN: 5x5 px
19
Overall Model
20
Overall Model
• Train:
• Reconstruct input of
each layer
• Optimization function
21
Overall Model
• Complex model?
22
3. PARALLELISM
23
Asynchronous SGD
 Two recent lines of research in speeding up
large learning problems:
• Parallel/distributed computing
• Online (and mini-batch) learning algorithms:
stochastic gradient descent, perceptron, MIRA,
stepwise EM
How can we bring together the benefits of
parallel computing and online learning?
24
Asynchronous SGD
SGD: Stochastic Gradient Descent:
• Choose an initial vector of parameters W and
learning rate α
• Repeat until an approximate minimum is
obtained:
• Randomly shuffle examples in the training set
25
26
27
28
Model Parallelism
• Weights divided according to locality of image
and store on different machine
29
5. EVALUATION
30
Evaluation
• 10M Youtube unlabeled frames of size
200x200
• 1B parameters
• 1000 machines
• 16,000 cores
31
Experiment on Faces
• Test set
• 37,000 images
• 13,026 face images
• Best neuron
32
Experiment on Faces (cnt’d)
• Visualization
• Top stimulus (images) for face neuron
• Optimal stimulus for face neuron
33
Experiment on Faces (cnt’d)
• Invariances Properties
34
Experiment on Faces (cnt’d)
• Invariances Properties
35
Experiment on Cat/Human body
• Test set
• Cat: 10,000 positive, 18,409 negative
• Human body: 13,026 positive, 23,974 negative
• Accuracy
36
ImageNet classification
• Recognizing images
• Dataset
• 20,000 categories
• 14M images
• Accuracy
• 15.8%
• State of art: 9.3%
37
5. DISCUSSION
38
Discussion
• Deep learning
• Unsupervised feature learning
• Learning multiple layers of representation
• Increase accuracy: Invariance, contrast
normalization
• Scalability
39
6. REFERENCES
40
References
1.
2.
3.
4.
5.
Quoc Le et al., “Building High-level Features using Large Scale Unsupervised
Learning”
Nando de Freitas, “Deep Learning”, URL:
https://www.youtube.com/watch?v=g4ZmJJWR34Q
Andrew Ng, “Sparse autoencoder”, URL:
http://www.stanford.edu/class/archive/cs/cs294a/cs294a.1104/sparseAutoencod
er.pdf
Andrew Ng, “Machine Learning and AI via Brain Simulations”, URL:
https://forum.stanford.edu/events/2011slides/plenary/2011plenaryNg.pdf
Andrew Ng, “Deep Learning”, URL:
http://www.ipam.ucla.edu/publications/gss2012/gss2012_10595.pdf
41
Download