Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California, Santa Barbara Slide adapted from Andrew Ng (Stanford), Nando de Freitas (UBC) 1 Agenda 1. Motivation 2. Approach 1. 2. 3. 4. 5. Sparse Deep Auto-encoder Local Receptive Field L2 Pooling Local contrast normalization Overall Model 3. Parallelism 4. Evaluation 5. Discussion 2 1. MOTIVATION 3 Motivation • Feature learning • Supervised learning • Need large number of labeled data • Unsupervised learning • Example: Build face detector without having labeled face images • Building high-level features using unlabeled data. 4 Motivation • Previous works • Auto encoder • Sparse coding • Result: Only learns low level features • Reason: Computational constraints • Approach • Dataset • Model • Computational resources 5 2. APPROACH 6 Sparse Deep Auto-encoder • Auto-encoder • Neural network • Unsupervised learning • Back-propagation 7 Sparse Deep Auto-encoder (cnt’d) • Sparse Coding • Input: Images x(1), x(2) ... x(m) • Learn: Bases (features) f1, f2, ..., fk, so that each input x can be approximately decomposed as: x=∑ajfj s.t. aj’s are mostly zero (“sparse”) 8 Sparse Deep Auto-encoder (cnt’d) 9 Sparse Deep Auto-encoder (cnt’d) • Sparse Coding • Regularizer 10 Sparse Deep Auto-encoder (cnt’d) • Sparse Deep Auto-encoder • Multiple hidden layers to achieve particular characteristic in learning features 11 Local Receptive Field • Definition: Each feature in the autoencoder can connect only to a small region of the lower layer • Goal: • Learn feature efficiently • Parallelism • Training on small image patches 12 L2 Pooling • Goal: Robust to local distortion • Approach: Group similar features together to achieve invariance 13 L2 Pooling • Goal: Robust to local distortion • Approach: Group similar features together to achieve invariance 14 L2 Pooling • Goal: Robust to local distortion • Approach: Group similar features together to achieve invariance 15 L2 Pooling • Goal: Robust to local distortion • Approach: Group similar features together to achieve invariance 16 Local Contrast Normalization • Goal: Robust to variation in light intensity • Approach: Normalize contrast 17 Local Contrast Normalization • Goal: Robust to variation in light intensity • Approach: Normalize contrast 18 Overall Model • 3 layers • Simple: 18x18 px • 8 neurons/patch • Complex: 5x5 px • LCN: 5x5 px 19 Overall Model 20 Overall Model • Train: • Reconstruct input of each layer • Optimization function 21 Overall Model • Complex model? 22 3. PARALLELISM 23 Asynchronous SGD Two recent lines of research in speeding up large learning problems: • Parallel/distributed computing • Online (and mini-batch) learning algorithms: stochastic gradient descent, perceptron, MIRA, stepwise EM How can we bring together the benefits of parallel computing and online learning? 24 Asynchronous SGD SGD: Stochastic Gradient Descent: • Choose an initial vector of parameters W and learning rate α • Repeat until an approximate minimum is obtained: • Randomly shuffle examples in the training set 25 26 27 28 Model Parallelism • Weights divided according to locality of image and store on different machine 29 5. EVALUATION 30 Evaluation • 10M Youtube unlabeled frames of size 200x200 • 1B parameters • 1000 machines • 16,000 cores 31 Experiment on Faces • Test set • 37,000 images • 13,026 face images • Best neuron 32 Experiment on Faces (cnt’d) • Visualization • Top stimulus (images) for face neuron • Optimal stimulus for face neuron 33 Experiment on Faces (cnt’d) • Invariances Properties 34 Experiment on Faces (cnt’d) • Invariances Properties 35 Experiment on Cat/Human body • Test set • Cat: 10,000 positive, 18,409 negative • Human body: 13,026 positive, 23,974 negative • Accuracy 36 ImageNet classification • Recognizing images • Dataset • 20,000 categories • 14M images • Accuracy • 15.8% • State of art: 9.3% 37 5. DISCUSSION 38 Discussion • Deep learning • Unsupervised feature learning • Learning multiple layers of representation • Increase accuracy: Invariance, contrast normalization • Scalability 39 6. REFERENCES 40 References 1. 2. 3. 4. 5. Quoc Le et al., “Building High-level Features using Large Scale Unsupervised Learning” Nando de Freitas, “Deep Learning”, URL: https://www.youtube.com/watch?v=g4ZmJJWR34Q Andrew Ng, “Sparse autoencoder”, URL: http://www.stanford.edu/class/archive/cs/cs294a/cs294a.1104/sparseAutoencod er.pdf Andrew Ng, “Machine Learning and AI via Brain Simulations”, URL: https://forum.stanford.edu/events/2011slides/plenary/2011plenaryNg.pdf Andrew Ng, “Deep Learning”, URL: http://www.ipam.ucla.edu/publications/gss2012/gss2012_10595.pdf 41